When building out the Notia client, we found a real lack of resources around building a persistently authenticated Python library.
To address this, we are going to be building an interactive, authenticated Python CLI that uses the Twitter API to fetch the top Machine Learning tweets of the week! You can see the final result in the video demo above - or you can skip to the final code here.
Building this CLI will let us explore concepts like authenticating a local device between uses, accepting CLI arguments with Click, and displaying our data interactively with Rich.
Twitter API Authentication
The Twitter API offers a few different methods of authentication depending
on your use case. We will only be looking to query publicly available
information, so the simple OAuth 2.0
authentication scheme is perfect.
The image below from their documentation shows how simple the flow is:
All we need to do is provide the Client ID and the Client Secret using Basic
authentication to retrieve a Bearer Token
. After that, we simply provide the token with each subsequent
request.
To get started, let's sign up as a developer on the Twitter Developer Dashboard here. Make sure to note down your Client ID, Client Secret and your app name - you'll need them later.
Project Setup
Next, let's start up a new Python Project with Poetry
and our required
dependencies. We've called our project Slice of Machine Learning!
poetry init sliceofml poetry add click rich requests-oauthlib
Basic CLI with Click
Click is the defacto standard for building intuitive CLIs in Python.
Let's mock a basic CLI that has the interface we want to expose to our users.
We can create a file, cli.py
, and point Poetry at it in our pyproject.toml
:
# pyproject.toml [tool.poetry.scripts] sliceofml = "sliceofml.cli:cli"
Next, let's setup our stub functions to test out how our users will interact with our library.
Our libray will expose just 2 commands:
-
login
- Checks if the user is already logged in, and if not prompts them. Additionally, they can also use the--relogin
flag to forcefully update their credentials. slice
- Fetches the requested time range of tweets from Twitter and displays them.
This simple interface below achieves what are are looking for:
# cli.py @click.group() def cli(): """ Slice of ML or sliceofml is your little π° of ML. """ @cli.command("login") @click.option("--relogin", "-r", is_flag=True) def login(relogin): click.echo(relogin) @cli.command("slice") @click.option("--daily", "frequency", flag_value="daily", default=True) @click.option("--weekly", "frequency", flag_value="weekly") def slice(frequency): click.echo(frequency)
Using poetry shell
, let's try out our fancy new commands and see how the interface
looks.
Persistent Authentication
Now that we have our interface defined, let's start building out our authentication functionality. In order to keep our local device authenticated between uses, we need to store our Bearer Token somewhere.
For that, we will use the ~/.netrc
file. This pattern has a long history, and is currently used by some popular CLIs such as the Heroku CLI). The netrc
file format is not particularly well defined (as
excellently explained here), however it will work great for our purposes.
The core of the ~/.netrc
format is a simple entry with 3 fields. Let's see how
an entry would look for our app:
machine api.twitter.com login <APP_NAME> password <BEARER_TOKEN>
Collecting credentials π
We need to collect the Client ID, Client Secret and App Name from our users and create an entry
in the netrc
file. We could collect these from the user just using print
and input
statements,
but it would be nice to make our CLI a little more... lively.
For this, let's reach for one of the best Python libraries out there -
Rich . Rich is an awesome
library for building TUIs (terminal user interfaces), featuring tons of useful functions and classes to make building UIs easy.
Let's create a new file, display.py
, and create a new Display
class.
This class will abstract over the top of the Rich Console
API to create some
functions we can reuse throughout our CLI.
You can see our Display
class below:
# display.py class Display: def __init__(self) -> None: self._console = Console() def log(self, msg_obj=None) -> None: self._console.print(msg_obj, style="bold green") def log_styled(self, msg_obj=None, style: Optional[str] = None) -> None: self._console.print(msg_obj, style=style) def warning(self, msg_obj=None) -> None: self._console.print(msg_obj, style="bold yellow") def error(self, msg_obj=None) -> None: self._console.print(msg_obj, style="bold red")
This class may seem overkill right now, but we will extend it later to display our tweets.
Let's use this to write a function to prompt users for their Client ID and Secret.
The Rich Panel class allows us to create a pretty slick looking prompt.
We will pair this prompt with getpass
and input
to get the required information from our users. We can store all this in a new file: apikey.py
.
# apikey.py DEVELOPER_DASHBOARD_URL = "https://developer.twitter.com/en/portal/dashboard" def prompt_api_details() -> Tuple[str, str, str]: api_prompt = Panel( f""" You can find your API keys :key: on your Twitter App Dashboard [blue underline bold][link={DEVELOPER_DASHBOARD_URL}]here[/link][/blue underline bold] """, box=box.ROUNDED, ) display.log_styled(api_prompt, style="yellow") display.log( "Paste the Client ID, Secret and App Name from your profile and hit enter: " ) client_id = getpass.getpass(prompt="Client ID π ") client_secret = getpass.getpass(prompt="Client Secret π΅οΈ ") app_name = input("App Name βοΈ ") return (client_id, client_secret, app_name)
You can see that Rich supports loads of great features we can take advantage of such as easy styling, hyperlinks and more! On line 7 we've styled our output as a link allowing users to navigate straight to the Developer Dashboard.
Now we have our prompt written, let's quickly update our login function to see how it looks:
# cli.py def login(relogin): (client_id, client_secret, app_name) = prompt_api_details() click.echo(f"""π Your Super Secret Credentials π Client ID: {client_id} Client Secret: {client_secret} App Name: {app_name}""")
As we can see, our TUI is really coming together! Let's move on to storing the user input.
Fetching our Bearer Token
We've now got our user credentials, but they aren't the final piece of the
puzzle. We need to exchange them for a Bearer Token
via the Twitter API. Instead
of manually creating a POST request to fetch the token,
we can leverage the requests_oauthlib
library to make the exchange easier.
Let's define a function request_access_token
which will take in our
client_id
and client_secret
and return us a fresh Bearer Token
:
# apikey.py REQUEST_TOKEN_URL = "https://api.twitter.com/oauth2/token" def request_access_token(client_id: str, client_secret: str) -> str: auth = HTTPBasicAuth(client_id, client_secret) client = BackendApplicationClient(client_id=client_id) oauth = OAuth2Session(client=client) try: token = oauth.fetch_token(token_url=REQUEST_TOKEN_URL, auth=auth) return token["access_token"] except Exception as err: display.error(f"{err}") raise ValueError(err)
Again, let's modify our cli.py
to quickly test this out:
# cli.py def login(relogin): (client_id, client_secret, app_name) = prompt_api_details() bearer_token = request_access_token(client_id, client_secret) click.echo(f"Your bearer token is: {bearer_token} ")
As we can see, we have successfully retrieved a new Bearer Token! Let's move on
to storing this token in our netrc
file.
Writing and reading the netrc
file
To store our token in the netrc
file, we need some way to create or modify an entry
and write it. Unfortunately, the netrc
module
from the standard library doesn't actually provide the ability to write to the netrc
file. Luckily for us, we can take some inspiration from the excellent Weights and Biases
client to see how they've written to the netrc
file with their write_netrc
function.
In addition to just writing to the file, we need a function that can check if an entry already exists
(this will prevent the need for a user to login every time). For this we can rely on _find_netrc_api_key
again from Weights and Biases.
These functions are quite long, but not terribly complex. For the sake of brevity we've omitted them here, but you can check them out in their full glory here:
Make sure to include these functions in your apikey.py
. You'll see how we've
used these functions in the following sections.
Tying it all together
Now that we can write and read from the netrc
, let's tie it all together in
our CLI.
First, we will define a function that uses _find_netrc_api_key
and returns our
app name and token separately.
# apikey.py def fetch_credentials(api_url: str) -> Tuple[str, str]: agent, token = None, None auth = _find_netrc_api_key(api_url, True) if auth and auth[0] and auth[1]: agent = auth[0] token = auth[1] return (agent, token) else: raise ValueError( f"Could not find entry in netrc file for provided URL: {api_url}" )
And with that, we have all the pieces we need to finish off our login
function! Check it out below:
#Β cli.py def login(relogin): apikey_configured = fetch_credentials(TWITTER_API) is not None if relogin: apikey_configured = False if not apikey_configured: (client_id, client_secret, app_name) = prompt_api_details() token = request_access_token(client_id, client_secret) write_netrc(TWITTER_API, app_name, token) else: click.echo("You're already logged in! π")
Our flow looks great! And if we test it out and cat
the contents of our netrc
file, we
can see:
machine api.twitter.com login SliceOfML password AAAAAAAAAAAAAAAAAAAAAM2qegEAAAAAcdvqnZQrt...
Success!
Getting the right Tweets
Now we've finished off our login
function, let's dig
into the slice
command and see how we can fetch the tweets we are looking for. Unfortunately, the V2 Twitter API doesn't offer an easy to use endpoint to fetch popular tweets.
However, we are more than capable of building our own. Exploring the Twitter API docs leads us to the handy /tweets/search/recent
endpoint, which fetches the last 7 days of tweets.
Let's create a new file, api.py
, and start a very basic API
class containing a requests
client:
# api.py class API: def __init__(self, user_agent: str, bearer_token: str, api_url: str) -> None: self._session = requests.Session() self._api_url = api_url self._request_url = self._api_url + "/2/tweets/search/recent" self._page_size = 100 self._max_pages = 100 self._user_agent = user_agent self._bearer_token = bearer_token self._display = Display() def bearer_oauth(self, r): r.headers["Authorization"] = f"Bearer {self._bearer_token}" r.headers["User-Agent"] = self._user_agent def query(self, frequency: str) -> None: response = self._get_request(self._request_url) print(response.json())
Have a go at plugging this into your CLI function. You'll find that all the tweets returned are pretty irrelevant to us, but it's great to make first contact!
Filtering
Now that we've fetched at least some tweets, let's start honing in on the ones we
want. For that, we need to define some good filters. The 'High Quality Filters' tutorial gives a deep dive on tailoring the API,
but to summarize, the functionality we are interested in are Tweet Annotations
.
Twitter tags each tweet with both Entity Annotations (NER) and Context Annotations. We can use the context_annotations
to fetch only ML related tweets.
They offer a handy CSV on their GitHub with every context annotation listed.
It's as simple as searching the CSV to find 'Machine Learning'. Each entry
consists of the domain_id
, entity_id
and entity_name
. We can see that ML falls under the Interests and Hobbies
category with a domain_id
of
66
and an entity_id
of 852262932607926273
.
Putting these together, we can now build a new URL in the following format:
https://api.twitter.com/2/tweets/search/recent?query=context%3A66.852262932607926273
Just this would get us all the ML related tweets for the past 7 days, which isn't far off from what we want. However, there is a few more parameters we need to enrich our final π° of ML.
-
tweet.fields=public_metrics
- Allows us to fetch the info we will need for sorting our tweets by popularity. -
expansions=author_id
- Enriches our API response with information about the user. We are particularly interested in the username. -
max_results
- By default, the Twitter API caps the number of results to 100, we can use this parameter, in conjunction with thenext_token
, to fetch multiple pages of results and collect them. -
start_time
- Thetweets/search/recent
endpoint gives us results for the past 7 days, which means we get ourweekly
functionality for free. However, if we want to offer thedaily
option, we also need to provide an RFC3339 formatted timestamp.
Let's define a function, _build_url
in our API
class that we can use to
append all the parameters we are interested in.
# api.py def _build_url(self, next_token: Optional[str], frequency: str) -> str: query_url = f"{self._request_url}?query=context:66.898661583827615744&tweet.fields=public_metrics&max_results={self._page_size}&expansions=author_id" if next_token and len(next_token) > 1: query_url = f"{query_url}&next_token={next_token}" if frequency == "daily": timestamp = datetime.utcnow() + timedelta(days=-1) query_url = f"{query_url}&start_time={timestamp.isoformat('T')}Z" return query_url
We will need to enhance our query
function in order to drive the new pagination
functionality. We should also extract only the fields we are interested in for
forwarding to our final display function. These are:
id
- Every tweet is assigned a unique ID. This Twitter blog post gives us a handy trick for building a live Tweet URL just from the ID.text
- The meat of the tweet!-
like_count
- We will be sorting our tweets by likes as a proxy for popularity. -
username
- We need to build a map ofuser_id => username
in order to correctly match up a tweet to it's author.
Putting this all together, you can see our query function defined below:
# api.py def query(self, frequency: str): page, next_token, user_map, tweets = 0, "", {}, [] while page < self._max_pages and next_token is not None: response = self._get_request(self._build_url(next_token, frequency)) json = response.json() next_token = json["meta"].get("next_token") for user in json["includes"]["users"]: user_map[user["id"]] = user["username"] for tweet in json["data"]: tweets.append(( tweet["id"], tweet["text"], tweet["public_metrics"]["like_count"], user_map.get(tweet["author_id"]))) page += 1 print(tweets) return tweets
Have a go at printing out the fields we extracted before we move on to displaying them.
Displaying our tweets
To display our tweets, we can again lean on Rich and start enhancing our
previously overkill Display
class. We can use the Table
class to display
each tweet as a row.
Check out the Table
docs for all the different ways you can customize
the table. For clarity, we've written a few helper functions to build the Profile and Tweet
links. We've also sorted the tweets by like count and taken the top 10,
so you only get the best of ML Twitter!
# display.py def buildProfileLink(self, username: str) -> str: return f"[bold blue][link={self.TWITTER_BASE}/{username}]@{username}[/link][/bold blue]" def buildTweetLink(self, _id: str) -> str: return f"[bold blue][link={self.TWITTER_BASE}/twitter/status/{_id}]View Tweet[/link][/bold blue]" def tweetsAsTable(self, tweets: List, frequency: str) -> None: tweets.sort(reverse=True, key=lambda t: t[2]) tweets = tweets[:10] table = Table( show_header=True, box=box.ROUNDED, show_lines=True, padding=(0, 1, 1, 0), border_style="yellow", caption_style="not dim") table.title = f"[not italic]π° Your {frequency} Slice of ML π°[/not italic]" table.caption = "Made with β€οΈ by the team at [bold blue][link=https://notia.ai]Notia[/link][/bold blue]" table.add_column("Username π§", justify="center") table.add_column("Tweet π¦", justify="center", header_style="bold blue", max_width=100) table.add_column("Tweet Link π", justify="center") table.add_column("Likes β€οΈ", justify="center", header_style="bold red") for tweet in tweets: table.add_row( self.buildProfileLink(tweet[3]), tweet[1], self.buildTweetLink(tweet[0]), str(tweet[2])) self._console.print(table)
Finally connect all this up to our slice
command in cli.py
like so:
# cli.py def slice(frequency): credentials = read_credentials(TWITTER_API) tweets = API(credentials[0], credentials[1], TWITTER_API).query(frequency) Display().tweetsAsTable(tweets, frequency)
And there you have it! Your very own authenticated, interactive
Python CLI. Check out the Github for an up to date version of the code, or just
pip install sliceofml
to get your daily slice of Machine Learning!