MEO API



Welcome to the Media Ecosystem Observatory's API ! This API will allow you to access the data collected by the MEO team.

Installation, Imports, Essentials

The following libraries required to query the API in a Python script come default with Python so there are no installation requirements, but you simply need to import.

import os, shutil, requests

Furthermore, all queries need to be done using the following host which we will call the base_url and make reference to throughout this documentation.

base_url = "https://meoinsightshub.net:8000"

To build a request for the API, always use the same format:

response = requests.post(f"{base_url}/{endpoint}/", json=dict, headers=headers, params=None | dict)

where the base_url is given the previous section, dict is the dictionary of query parameters, headers the dictionary containing the token, and params parameters to the python function treating the API call. Throughout the next sections we will explain what all these parameters are !

Authentication (headers)

To authenticate we need to generate a token which will be used for authentication in all queries. Assuming you've been given your username and password, it can be done in two ways:

  • by GET simply opening browser and visiting https://meoinsightshub.net:8000/meologin?username=username&password=password
  • by sending a POST request to the /meologin endpoint. This is done by passing a dictionary {"username": username, "password": password} where username and password are, respectively, your assigned username and password. If the query is successful, your access token can be accessed in the json attribute using the access_token key. The full code will look like this:
username: str ='XXXXXXXXXX' password: str = 'XXXXXXXXXX' base_url = "https://meoinsightshub.net:8000" def get_token(base_url, username, password): params = {"username": username, "password": password} response = requests.post(f"{base_url}/meologin", params=params) if response.status_code == 200: return response.json()["access_token"] return None

When making an HTTP request with the requests library, we need to verify that the request went through by checking the value of response.status.code. This will contain the HTTP status code which follows conventions which can be viewed here: here. You will often see:


  • 200 : Successful Response - the API call was successful
  • 400 : Logging in Error
  • 422 : Validation Error
  • 500 : Internal server error - there was an error in processing the query. This is usually because problematic parameters were passed.

After obtaining your access token, you'll need to include it in the request headers to make authenticated API calls. This is done by creating a dictionary in Python that specifies the Authorization header, following the Bearer Token scheme.

headers = { "Authorization": f"Bearer {TOKEN}" # Add Bearer token to the headers } #make an api call response = requests.get(url, headers=headers)

Endpoints

Below is a list of the available endpoints:



POST /meologin

Retrieves the authentication token for an API account given a valid username and password.

POST https://meoinsightshub.net:8000/meologin

Parameter Description
username A string with the username that was attributed to you.
password The corresponding password.

Output

The output will be a dictionary dict summarised below which can be accessed at response.json() .

Key Datatype Value description
access_token str The unique access token associated to your account.
status int Status of the request.
token_type str Description of the use case of the token.


Code example & response

Below, we give an example of a call to the endpoint and replace the username, password and access token with 'XXXXXXXX'.

params: dict = {"username": "XXXXXXXXX", "password": "XXXXXXXXX"} response: requests.Response = requests.post(f"{base_url}/meologin", params=params) response.json() >> { 'access_token': 'XXXXXXXXXXXX', 'status': 200, 'token_type': 'bearer' }

The token needs to be passed to the headers parameter for all endpoint calls in a dictionary which should be built using the next bit of code:

TOKEN: str = response.json()["access_token"] headers = { "Authorization": f"Bearer {TOKEN}" # Add Bearer token to the headers }

POST /search

POST https://meoinsightshub.net:8000/search

Retrieves a page of Facebook posts, YouTube videos, TikTok posts or Instagram posts matching the query requirements between a range of dates and where the maximum number of returned results is passed as a parameter.

Parameters

Parameter Type Description Options
platform str The platform you want to get data from. facebook, instagram, tiktok or youtube e.g "platform":"youtube"
query str Search query you want to use simply keywords. See last section for how to build a query
from_date str The minimum date from which to fetch posts. Any date in the format "DD-MM-YYYY" e.g. "from_date":"12-08-2023"
to_date str The maximum date to which to get posts. Any date in the format "DD-MM-YYYY" where to_date < from_date e.g "from_date":"12-10-2023"
size str (Optional) Number of posts to return by elastic search.
Default: "size":10000
Any int n > 0 and n <= 10,000 e.g "size":10
sort_field str (Optional) Column from which to sort the posts returned. Any column that can be sorted e.g. "sort":"_id"
sort_type str (Optional) Sort ordering.
Default: "sort_type":'desc'
We can choose ascending sort_type:str = 'asc' or descending sort_type:str = 'desc'
select_fields list[str] (Optional) Selects the fields to return.
Default: all fields
Any field that would be returned in the query depending on the platform e.g. for Facebook select_fields: ["history48.actual.likeCount, history48.actual.angryCount"]

Output

A list of dictionaries list[dict] whose keys correspond to all the fields in the returned documentary. A table with all the fields available for each social media platform is available in the section building a query. The output dictionary of response.json() is summarized in the following table:

Key Datatype Value description
data list[dict] A list of dictionaries list[dict] with the returned query data where every dictionary is the information of one post.
recordsTotal int The total number of posts matching the query requirement.
recordsFiltered int The number of posts returned (i.e. the length of the list of dictionaries).

Code example & response

In the following example we search Facebook data for 10,000 posts that contain "Justin" between the 5th of January 2023 and the 5th of February 2024 and whose results we want to be sorted by the post date.

data = { 'platform': 'facebook', 'query': 'Justin', 'from_date': '05-01-2023', 'to_date': '05-02-2024', 'size': '10000', 'sort': 'date' } response: requests.Response = requests.post(f'{base_url}/search', json=data, headers=headers) response.json()['data'][0] >> { 'id': '173308|921960019286389', 'platformId': '100044171992369_921960019286389', 'platform': 'Facebook', 'date': '2024-01-05T18:14:51-05:00', 'updated': '2024-01-30T17:29:08-05:00', 'type': 'status', 'title': 'N/A', 'caption': 'N/A', 'description': 'N/A', 'message': 'Nous construisons plus de logements, plus rapidement. Aujourd’hui, le ministre...', 'expandedLinks': [], 'link': 'N/A', 'postUrl': 'https://www.facebook.com/100044171992369/posts/921960019286389', 'subscriberCount': 8681703, 'score': -1.1009345794392524, 'statistics': { 'actual': { 'likeCount': 719, 'shareCount': 44, 'commentCount': 847, 'loveCount': 99, 'wowCount': 2, 'hahaCount': 410, 'sadCount': 1, 'angryCount': 18, 'thankfulCount': 0, 'careCount': 18}, 'expected': { 'likeCount': 726, 'shareCount': 55, 'commentCount': 1111, 'loveCount': 114, 'wowCount': 3, 'hahaCount': 323, 'sadCount': 4, 'angryCount': 20, 'thankfulCount': 0, 'careCount': 19 } }, 'account': { 'id': 173308, 'name': 'Justin Trudeau', 'handle': 'JustinPJTrudeau', 'subscriberCount': 8677043, 'accountType': 'facebook_page', 'pageAdminTopCountry': 'CA', 'pageDescription': 'Online Community Guidelines: lpc.ca/a17v. \n\nLignes directrices...', 'pageCreatedDate': '2007-12-13T20:42:31', 'pageCategory': 'POLITICIAN' }, 'history24': { 'actual': { 'likeCount': 528, 'shareCount': 28, 'commentCount': 593, 'loveCount': 70, 'wowCount': 2, 'hahaCount': 287, 'sadCount': 1, 'angryCount': 12, 'thankfulCount': 0, 'careCount': 12}, 'expected': { 'likeCount': 565, 'shareCount': 45, 'commentCount': 939, 'loveCount': 86, 'wowCount': 4, 'hahaCount': 256, 'sadCount': 4, 'angryCount': 17, 'thankfulCount': 0, 'careCount': 15 }, 'timestep': 38, 'date': '2024-01-06T23:02:31', 'score': -1.2596975673898752 }, 'history48': { 'actual': { 'likeCount': 612, 'shareCount': 37, 'commentCount': 725, 'loveCount': 86, 'wowCount': 2, 'hahaCount': 348, 'sadCount': 1, 'angryCount': 14, 'thankfulCount': 0, 'careCount': 16 }, 'expected': { 'likeCount': 608, 'shareCount': 48, 'commentCount': 990, 'loveCount': 95, 'wowCount': 4, 'hahaCount': 281, 'sadCount': 4, 'angryCount': 18, 'thankfulCount': 0, 'careCount': 15 }, 'timestep': 46, 'date': '2024-01-07T23:00:29', 'score': -1.1221917808219177 }, 'history72': { 'actual': { 'likeCount': 668, 'shareCount': 39, 'commentCount': 761, 'loveCount': 94, 'wowCount': 2, 'hahaCount': 375, 'sadCount': 1, 'angryCount': 14, 'thankfulCount': 0, 'careCount': 17 }, 'expected': { 'likeCount': 637, 'shareCount': 49, 'commentCount': 1015, 'loveCount': 99, 'wowCount': 4, 'hahaCount': 292, 'sadCount': 4, 'angryCount': 18, 'thankfulCount': 0, 'careCount': 16 }, 'timestep': 50, 'date': '2024-01-08T22:44:52', 'score': -1.0839303991811668 }, 'seed_id': 2294, 'seed_name': 'Justin Trudeau', 'seed_type': 'politician', 'seed_subtype': 'MP', 'seed_province': 'Quebec', 'seed_party': 'Liberal', 'seed_link': 'https://www.facebook.com/JustinPJTrudeau/', 'seed_news_category': '' }

POST /search_scroll

POST https://meoinsightshub.net:8000/search_scroll

We saw previously that the /search endpoint returns a user-defined maximum up to 10,000 posts. However, if you want to fetch all Facebook posts, YouTube videos, TikTok posts or Instagram posts that match a query and there are more than 10,000 posts, then /search_scroll is the appropriate endpoint. The endpoint works very similarly to /search except in addition you pass a scroll_id parameter. This is a search context whereby after the first query, the endpoint will return a number of results equal to size and a scroll_id which can be past to the next scrolling search request to return the next batch of results whose number is equal to size or the remaining items that weren't returned. The scroll_id can be passed to any subsequent scrolling search calls to continue returning the next chunk of results until all query results have been returned. You can think of it as the /search endpoint returns the first page of results while /search_scroll returns the next page of results with every call.


Note:

  • in our case search contexts designated by their scroll_id will only remain active for 2 minutes and will be the same for all calls to the endpoint during that time.
  • the first time the /search_scroll endpoint is called, set scroll_id = None and update it after subsequent calls with the returned id

Parameter Type Description Options
platform str The platform you want to get data from. facebook, instagram, tiktok or youtube e.g "platform":"youtube"
query str Search query you want to use simply keywords. See last section for how to build a query
from_date str The minimum date from which to fetch posts. Any date in the format "DD-MM-YYYY" e.g. "from_date":"12-08-2023"
to_date str The maximum date to which to get posts. Any date in the format "DD-MM-YYYY" where to_date < from_date e.g "from_date":"12-10-2023"
scroll_id str String id of the search context. None or the scroll_id returned by a previous call to the endpoint
size str (Optional) Number of posts to return by elastic search.
Default: "size":10000
Any int n > 0 and n <= 10,000 e.g "size":10
sort_field str (Optional) Column from which to sort the posts returned. Any column that can be sorted e.g. "sort":"_id"
sort_type str (Optional) Sort ordering.
Default: "sort_type":'desc'
We can choose ascending sort_type:str = 'asc' or descending sort_type:str = 'desc'
select_fields list[str] (Optional) Selects the fields to return.
Default: all fields
Any field that would be returned in the query depending on the platform e.g. for Facebook select_fields: ["history48.actual.likeCount, history48.actual.angryCount"]

Output

The output is very similar to that of the /search endpoint. The output dictionary of response.json() is summarized in the following table:

Key Datatype Value description
data list[dict] A list of dictionaries list[dict] with the returned query data where every dictionary is the information of one post.
recordsTotal int The total number of posts matching the query requirement.
recordsFiltered int The number of posts returned (i.e. the length of the list of dictionaries).
scroll_id str Search context ID whose value is a string or None

Code example & response

Below, we perform a scrolling search of the results of searching for all posts posted between the 5th of January 2024 and the 5th of February 2024. We also ask that we get only 1000 posts at a time.

Note how we write the endpoint call in a while True loop, incrementally adding data to a list all_data , and break out of the loop once we've fetched all posts matching our search conditions. We know we've fetched all data when response.json()["recordsFiltered"] == 0 . Finally, scroll_id is passed to the endpoint call in a dictionary whose value is either None the first time the endpoint is called, or equal to the returned scroll_id in subsequent calls.

all_data: list[dict] = [] scroll_id = None while True: params = { "platform": 'youtube', "query": "", 'from_date': '05-01-2024', 'to_date': '05-02-2024', "size": 1000 } response = session.post(f'{base_url}/search_scroll', json=params, headers=headers, params={"scroll_id": scroll_id} if scroll_id else None) scroll_id = response.json()['scroll_id'] # get the scroll_id with the id returned by the endpoint call new_data = response.json()['data'] for d in new_data: # iterate through the new data and append to list all_data.append(d) if response.json()["recordsFiltered"] == 0: print("No more data recordsFiltered = 0") break print(scroll_id) len(all_data) >> 'FGluY2x1ZGVfY29udGxxxx' 9097


POST /timeline

POST https://meoinsightshub.net:8000/timeline

This endpoint performs a date-based aggregation from a specified search query to create a histogram. Each bucket corresponds to a time interval defined by the ag_time_interval parameter, allowing for the tallying of posts within each specified interval.

Parameter Description Options
platform The platform you want to get data from facebook, instagram, tiktok or youtube e.g "platform":"youtube"
query Elastic search query you want to use See last section for how to build a query
from_date The minimum date from which to fetch posts Any date > 2023 in the format "DD-MM-YYYY" e.g. "from_date":"12-08-2023"
to_date The maximum date to which to get posts Any date in the format "DD-MM-YYYY" where to_date < from_date e.g "from_date":"12-10-2023"
agg_time_interval The time interval over which data should be aggregated. The format should be {n}{y | d} where n is the number of days and y | d stands for year or day e.g. "agg_time_interval": "1d"

Output

The output is a list of dictionaries list[dict] where each dictionary is a single day in the specified time span. The list of dictionaries can be found in response.json()['timeline'] and are summarized below.

Key Value description
count Number of documents returned by the query for that day.
date The specific day over which the number of posts were calculated.


Code example & response

Below we get the number of posts per day between the 5th and 10th of January 2024 for posts contain Trudeau.

data = { 'platform': 'facebook', 'query': 'Trudeau', 'from_date': '05-01-2024', 'to_date': '10-01-2024' } response: requests.Response = requests.post(f"{base_url}/timeline", json=data, headers=headers) response.json()['timeline'] >> [ {'count': 111, 'date': '2024-01-05T00:00:00'}, {'count': 25, 'date': '2024-01-06T00:00:00'}, {'count': 20, 'date': '2024-01-07T00:00:00'}, {'count': 72, 'date': '2024-01-08T00:00:00'}, {'count': 96, 'date': '2024-01-09T00:00:00'} ]

POST /timeline_advanced

POST https://meoinsightshub.net:8000/timeline_advanced

The timeline_advanced endpoint works similarly to the timeline endpoint but also gives the user the ability to aggregate by a list of fields and calculate daily metrics from a list of functions. Like the /timeline endpoint /timeline_advanced computes metrics on a chosen interval. However, unlike timeline which counts the number of posts returned per interval, /timeline_advanced will also compute metrics over a list of specified post fields and lets the user decide what those metrics are (e.g. sum, average etc...). The number of posts per day is still available though through the doc_count value in the returned dictionary.

Parameters

Parameter Type Description Options
platform str The platform you want to get data from. facebook, instagram, tiktok or youtube e.g "platform":"youtube"
query str Search query you want to use simply keywords. See last section for how to build a query
from_date str The minimum date from which to fetch posts. Any date in the format "DD-MM-YYYY" e.g. "from_date":"12-08-2023"
to_date str The maximum date to which to get posts. Any date in the format "DD-MM-YYYY" where to_date < from_date e.g "from_date":"12-10-2023"
agg_field str The fields by which to aggregate the posts returned by the query. A list of comma-seperated post attributes e.g. for YouTube we could aggregate by 'seed_id','categoryID'
agg_funct str The function used to compute the desired metrics. All aggregate functions in the table below e.g. 'avg','count','max'
agg_funct_field str The fields over which to perform the computation specified in agg_funct. A list of comma-seperated post attributes e.g. for YouTube, we could perform the calculations over the comments.likeCount
agg_time_interval str The time interval over which data should be aggregated. The format should be {n}{y | d} where n is the number of days and y | d stands for year or day e.g. "agg_time_interval": "1d"

For a full list of all metrics that can be computed after an aggregation, you can look at Elasticsearch's documentation However, some common ones that are available are:


  • avg : the average of numeric values that are extracted from the aggregated documents
  • sum : sums up numeric values that are extracted from the aggregated documents
  • min : keeps track and returns the minimum value among numeric values extracted from the aggregated documents
  • max : keeps track and returns the maximum value among the numeric values extracted from the aggregated documents

Output

The output is a list[dict] where each dict returns the numbers for a field transformed by the function for a single day for a single aggregation group.

The output dictionary can be found at response.json()['timeline'] and is summarized below.

Key Value description
doc_count Number of documents in the aggregation over the query results for that day.
field_{agg_field} Value of the field in agg_field over which the aggregation was called. If the aggregation was called over multiple fields, there will be a different key field_{namer of field} for each field over which the aggregation was called.
date Date over which the results were calculated.
{agg_funct}_{agg_funct_field} Value of function agg_funct parameter with input the values of agg_funct_field parameter. If there are multiple fields in agg_funct_field , there will be a unique key-value pair '{agg_funct}_{agg_funct_field}':value for each comma-seperated field passed as parameter.

Code example & response

In the example below, we get a dictionary for each day for each unique user ID) between the 5th and 10th of January and compute the average number of angry reactions and likes that the poster got on their posts for each day. Since our range is between the 5th --> 10th, we'll get the average counts for the 5th, 6th, 7th, 8th and 8th of January 2024. Due to space constraints and for the sake of conciseness, we only show the first 5 results returned by the endpoint.

data = { 'platform': 'facebook', 'query': 'seed_name:Justin Trudeau', 'from_date': '05-01-2024', 'to_date': '10-01-2024', 'agg_funct_field': "history48.actual.likeCount,history48.actual.angryCount", 'agg_field': "account.accountType,seed_party", 'agg_funct': "avg", "agg_time_interval": '1d' } response: requests.Response = requests.post(f"{base_url}/timeline_advanced", json=data, headers=headers) response.json()['timeline'][0:5] >> [ {'doc_count': 32, 'field_account.accountType': 'facebook_page', 'field_seed_party': '', 'date': '2024-01-05 00:00:00', 'avg_history48.actual.likeCount_value': 11.625, 'avg_history48.actual.angryCount_value': 2.0}, {'doc_count': 18, 'field_account.accountType': 'facebook_page', 'field_seed_party': '', 'date': '2024-01-06 00:00:00', 'avg_history48.actual.likeCount_value': 2.2777, 'avg_history48.actual.angryCount_value': 2.38}, {'doc_count': 8, 'field_account.accountType': 'facebook_page', 'field_seed_party': '', 'date': '2024-01-07 00:00:00', 'avg_history48.actual.likeCount_value': 2.625, 'avg_history48.actual.angryCount_value': 0.125}, ... ]

POST /seedlist

POST https://meoinsightshub.net:8000/search

Parameters

Parameter Description Options
platform The platform for which we want the seeds information. If we want seed information for all platforms we use 'platform': 'seeds' , else we can specify any platform e.g. 'facebook' , 'youtube' , >'tiktok' , >'instagram'
size Number of seeds to return in the call Any int n > 0 and n <= 10,000 e.g <"size":10000

Response

data = { 'platform': 'seeds', 'query': '', 'size': 10000 } response: requests.Response = requests.post(f"{base_url}/seedlist", json=data, headers=headers) response.json()['data'][0:5] >>> [ { 'Name': 'Pete Guthrie', 'ID': '1000', 'MainType': 'politician', 'SubType': 'MLA', 'Instagram': 'https://www.instagram.com/peterguthrie99/', 'Facebook': 'https://www.facebook.com/peterguthrie99/', 'Youtube': 'https://www.youtube.com/@peterguthrie5322', 'Tiktok': '', 'Province': 'Alberta', 'Party': 'UC', 'Facebook_Alt_URL': 'https://www.facebook.com/863298240496439', 'NewsOutletCategory': '' }, {'Name': 'Angela Pitt', 'ID': '1001', 'MainType': 'politician', 'SubType': 'MLA', 'Instagram': 'https://www.instagram.com/angelapitt_ucp/', 'Facebook': 'https://www.facebook.com/AngelaPittAirdrie/', 'Youtube': '', 'Tiktok': '', 'Province': 'Alberta', 'Party': 'UC', 'Facebook_Alt_URL': 'https://www.facebook.com/1436991696520809', 'NewsOutletCategory': ''}, ... ]

Building a query

Kibana Query Language

To write a query you must first understand what syntax to use. The API fetches data who have been indexed into an Elasticsearch database system. Queries used for all endpoints must be written in Kibana Query Language (KQL) whose documentation can be found here. A useful Medium article to familiarize yourself with KQL can be found here (though you can ignore the mentions of the Lucerne query language) and a handy cheat sheet here.


Fields to query over per platform

The API provides post data collected from several different platforms (Facebook, Instagram, YouTube and TikTok) each with their own specific fields and attributes that can be used to filter the data. While we work on adding a full table with all the attributes that can be included in the queries, use the example queries for inspiration.


Quick Tip: Identifying Available Fields for Advanced Queries and for Query Parameters

To effortlessly identify the fields available for use in your queries, consider executing a basic query that requests a minimal amount of data, such as setting size=1. This approach retrieves a single data entry, allowing you to examine the structure and fields present in the response. These fields can be instrumental in crafting more sophisticated search queries.


Crafting Targeted Search Queries: For instance, searching for "Trudeau" might yield results across various fields, including text, titles, messages, and account names. However, if your goal is to specifically find posts by users with "Trudeau" in their account name, you would refine your query to account.name:Trudeau.


Excluding Specific Content: To capture all posts mentioning "Trudeau" but exclude those authored by accounts with "Trudeau" in the name, you can construct a more nuanced query: Trudeau AND NOT account.name:Trudeau. This formulation ensures you receive content relevant to "Trudeau" without the contributions from users named Trudeau.

Utilizing Queries in the Interface:

The same query syntax applied programmatically can also be employed directly within the interface https://meoinsightshub.net/, offering a seamless transition between manual and automated search processes.By familiarizing yourself with the available fields and mastering these query techniques, you can significantly enhance the precision and efficiency of your searches, tailoring the results to meet your exact requirements.