This notebook provides an overview of the key methodology and criteria used by the National Telecommunications and Information Administration (NTIA) to determine location service levels for the Broadband Equity, Access, and Deployment (BEAD) program. The sample code can be used to replicate this methodology for any given state with minimal resources, and to export the resulting Broadband Serviceable Location (BSL) service level data.
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Methodology & Documentation:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A. Key Criteria & Constants:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### a. Service Levels: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### i. How does the BEAD program define an “unserved” location?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"Section I.C.bb. of the NOFO defines unserved locations as locations lacking reliable broadband service or with broadband service offering speeds below 25 megabits per second (Mbps) downstream/3 Mbps upstream at a latency of 100 milliseconds or less. Reliable broadband means broadband service that the Broadband DATA Maps show is accessible to a location via fiber-optic technology; Cable Modem/ Hybrid fiber-coaxial technology; digital subscriber line technology; or terrestrial fixed wireless technology utilizing entirely licensed spectrum or using a hybrid of licensed and unlicensed spectrum. Locations that are served by satellite or purely unlicensed spectrum will also be considered unserved. See BEAD FAQ’s\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### ii. How does the BEAD program define an “underserved” location?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"Section I.C.cc. of the NOFO defines underserved locations as locations that are identified as having access to reliable broadband service of at least 25 Mbps downstream/3 Mbps upstream but less than 100 Mbps downstream/20 Mbps upstream at a latency of 100 milliseconds or less. Reliable broadband means broadband service that the Broadband 5 DATA Maps show is accessible to a location via fiber-optic technology; Cable Modem/ Hybrid fiber-coaxial technology; digital subscriber line technology; or terrestrial fixed wireless technology utilizing entirely licensed spectrum or using a hybrid of licensed and unlicensed spectrum. \n",
"See BEAD FAQ’s\n",
"
\n",
"\n",
"Based on the definition of \"Reliable broadband\" stated above, NTIA includes technology codes listed below in the analysis of a location's max service level.\n",
"\n",
"
\n",
"\n",
"\n",
"Based on the FCC definition of \"low latency\" in the BDC data specification, NTIA classifies service availability with latency above 100 milliseconds as unserved.\n",
"\n",
"\n",
"
\n",
" The BDC dataset indicates low latency status with Boolean codes:\n",
"
Once the BDC data is filtered by the applied service level criteria and joined to the location fabric, NTIA applies the SQL query below to determine a locations max service level.
\n",
"\n",
"
\n",
" \n",
"~~~sql\n",
" -- For each provider reported record, determine the service level (sub-query)\n",
" -- Then select the highest service level for each location \n",
" SELECT\n",
" location_id,\n",
" -- Get the max service level from the sub-query to determine the max service value, map it to an approperiate string value\n",
" (CASE\n",
" WHEN MAX(service_level) = 0 THEN 'Unserved' -- if the max value for a location is 0, it is Unserved\n",
" WHEN MAX(service_level) = 1 THEN 'Underserved' -- if the max value for a location is 1, it is Underserved\n",
" WHEN MAX(service_level) = 2 THEN 'Served' -- if the max value for a location is 2, it is Served\n",
" END) AS loc_max_service_level\n",
" FROM (\n",
" SELECT\n",
" *,\n",
" (CASE\n",
" -- When a provider level records meets the critera for Unserved, return 0\n",
" -- Unserved: Speeds below 25/3 Mbps or NULL OR without low_latency (low_latency=0)\n",
" WHEN low_latency = 0 THEN 0\n",
" WHEN max_advertised_download_speed IS NULL THEN 0\n",
" WHEN max_advertised_upload_speed IS NULL THEN 0\n",
" WHEN max_advertised_download_speed < 25 THEN 0\n",
" WHEN max_advertised_upload_speed < 3 THEN 0\n",
" \n",
" -- When a provider level records meets the critera for Underserved, return 1 \n",
" -- Underserved: Speeds at or above 25/3 Mbps, but Below 100/20 Mbps with low_latency (low_latency=1)\n",
" WHEN max_advertised_download_speed BETWEEN 25 AND 99.999 THEN 1\n",
" WHEN max_advertised_upload_speed BETWEEN 3 AND 19.999 THEN 1\n",
" \n",
" -- When a provider level records meets the critera for Served, return 2\n",
" -- Served: Service Level at or above 100/20 Mbps with low_latency (low_latency=1)\n",
" WHEN max_advertised_download_speed >= 100 THEN 2\n",
" WHEN max_advertised_upload_speed >= 20 THEN 2\n",
" END) AS service_level\n",
" FROM bdc\n",
" ) t1\n",
" -- Group by the location id to get one record per location \n",
" GROUP BY location_id\n",
"~~~\n",
"
\n",
"\n",
"Python Pandas:\n",
"\n",
"\n",
"
To obtain the same results using pandas, a defined function named \"calculate_service_level\" is provided in the notebook. The \"calculate_service_level\" function wil calculate the service level based on the same service level criteria contained in the SQL sample above.\n",
"
\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### B. Location records:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### i. Why must the BDC data be joined to the Location Fabric?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"The location fabric contains all of the broadband serviceable locations (BSL) whereas, the BDC dataset only contains locations where a provider has reported service availability.\n",
"Locations in the location fabric without corresponding service availability in BDC dataset are then labeled as \"unserved\". Without performing this join, these locations will not be counted in the results.\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### ii. Why use the block_geoid column rather than the state column?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
" While not numerous, some misattribution has occurred in the state column. The block_geoid column was found to be the most reliable column for isolating state records. \n",
" The first two characters of the block_geoid column represent the state FIPS code. \n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### iii. What is a Broadband Serviceable Location (BSL)?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Downlaod and place the Broadband Data Collection (BDC) zipfiles in a separate directory without any other files. Take note of the full directory path.\n",
"
Note: The BDC CSV files should remain zipped for the code sample to work properly and for memory optimization.\n",
" The sample code below will use all of the zipfiles in the directory as data sources.\n",
" Ensure a file for each \"reliable broadband\" technology type is downloaded. Reference the applied service level criteria for a list of technology types.\n",
"\t\t\t
\n",
" \n",
"
Once the files are in place, adjust the analytic variables in the parameters section. Provide/update each path variable.
\n",
"
Run the notebook with the updated variables. Sample results will be displyed in the notebook while full results will be exported to csv files.
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set technology codes to use in the BDC servicel level analyis\n",
"# As a standard, NTIA includes technology codes (10, 40, 50, 71, 72) in the analysis of a location's max service level\n",
"tech_codes = [10, 40, 50, 71, 72]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### E. Data Processing:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### a. Obtain input file info: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Get a list of all zipped BDC CSV files in the directory\n",
"zipped_csv_files = glob.glob(f\"{bdc_dir_path}/*.zip\")\n",
"\n",
"# Iterate over the list of zipped CSV files to print each path and a file count\n",
"for number, zipped_file in enumerate(zipped_csv_files):\n",
" print(number + 1, zipped_file)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get the state FIPS code for the BDC files, the FIPS code will be used to query the location fabric\n",
"state_fips = zipped_csv_files[0].split('_')[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### b. Read BDC data: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a list of columns to read. By not reading all available columns, memory usage is optimized\n",
"use_columns = [\n",
" 'location_id',\n",
" 'block_geoid',\n",
" 'technology',\n",
" 'max_advertised_download_speed',\n",
" 'max_advertised_upload_speed',\n",
" 'low_latency',\n",
" ]\n",
"\n",
"# Set the BDC columns to treat as integers\n",
"int_cols = [\n",
" 'technology', 'max_advertised_download_speed', \n",
" 'max_advertised_upload_speed', 'low_latency'\n",
" ]\n",
"\n",
"# Create a list to store DataFrames\n",
"dataframes = []\n",
"\n",
"# Iterate over the list of zipped CSV files\n",
"for zipped_file in zipped_csv_files:\n",
"\n",
" # Read the zipped BDC CSV file into a DataFrame, ensure to only read the selected state records\n",
" # All columns will be of type \"object\" (str) which ensures pandas.concat will not incure datatype errors\n",
" _df = pd.read_csv(zipped_file, \n",
" compression=\"zip\", \n",
" dtype=str,\n",
" usecols=use_columns\n",
" )[lambda df: df['block_geoid'].str.startswith(state_fips.zfill(2))]\n",
" \n",
" # Set columns to treat as integers\n",
" _df[int_cols] = _df[int_cols].astype(int)\n",
"\n",
" # Filter by specified technology codes \n",
" _df = _df[_df['technology'].isin(tech_codes)] \n",
" \n",
" # Add the DataFrame to the list of DataFrames\n",
" if len(_df) > 0:\n",
" dataframes.append(_df)\n",
"\n",
"# Concatenate the list of BDC DataFrames into a single BDC DataFrame\n",
"bdc_df = pd.concat(dataframes)\n",
"\n",
"# Clear initial file read dataframes to conserve resources/memeory\n",
"for _df in dataframes: \n",
" del(_df)\n",
"del(dataframes)\n",
"\n",
"# Print a count of BDC records for the state\n",
"f\"Total State BDC records: {len(bdc_df):,}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Describe the data types for each column in the BDC dataset (optional)\n",
"bdc_df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### c. Read Location Fabric data: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a list of columns to read. By not reading all columns, memory is optimized\n",
"use_columns = [\n",
" 'location_id',\n",
" 'block_geoid',\n",
" 'latitude',\n",
" 'longitude',\n",
" 'bsl_flag',\n",
" ]\n",
"\n",
"# Read the un-zipped Location Fabric CSV file into a DataFrame, ensure to only read the selected state records\n",
"state_loc_fab_df = pd.read_csv(\n",
" loc_fab_path, \n",
" dtype=str,\n",
" usecols=use_columns\n",
" )[lambda df: df['block_geoid'].str.startswith(state_fips.zfill(2))]\n",
"\n",
"# Filter to ensure only BSL's are included in the analysis \n",
"state_loc_fab_df = state_loc_fab_df[state_loc_fab_df['bsl_flag']=='True'] \n",
" \n",
"# Print a count of BSL records for the state\n",
"f\"Total BSL's: {len(state_loc_fab_df):,}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Describe the data types for each column in the Location Fabric dataset (optional)\n",
"state_loc_fab_df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### d. Location Fabric & BDC Join: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Some BSL's may not have a service level reported within the BDC data\n",
"# Left join the Location Fabric BSL to the BDC dataset, Nulls will be labled as \"Unserved\"\n",
"df = state_loc_fab_df.merge(bdc_df, on='location_id', how='left', suffixes=('', '_df2_'))\n",
"\n",
"# Remove columns duplicated in the join\n",
"dup_cols = [c for c in df.columns if c.endswith('_df2_')]\n",
"df = df.drop(columns=dup_cols)\n",
"\n",
"# Clear up some memmory by removing the BDC and location fabric dataframes\n",
"del(bdc_df)\n",
"del(state_loc_fab_df)\n",
"\n",
"# Print a count of join records for the state\n",
"f\"Total join records: {len(df):,}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Describe the data types for each column in the joined dataset (optional)\n",
"df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### e. Location max service level determination:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### i. Methodology references:\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def calculate_service_level(row):\n",
" \"\"\"Function calculates the service level based on service level criteria.\n",
" The enumerated service level is calculated for a given row in a pandas dataframe/series.\n",
" An enumerated service level rank value is returned. \n",
" \n",
" Parameters:\n",
" pd.Series:\n",
" A Pandas row of data containing the columns:\n",
" -'max_advertised_download_speed'\n",
" -'max_advertised_upload_speed'\n",
" -'low_latency'\n",
"\n",
" Returns:\n",
" int: \n",
" An enumerated service level calculation results:\n",
" -1: 'Service Level Error'\n",
" 0: 'Unserved' \n",
" 1: 'Underserved'\n",
" 2: 'Served'\n",
"\n",
" Example:\n",
" >>> df['enum_service_level'] = df.apply(calculate_service_level, axis=1)\n",
" \"\"\"\n",
" #------------------------------------------------------------------------------------------------------ \n",
" # Check if download or upload speed values are missing\n",
" if (\n",
" pd.isna(row['max_advertised_download_speed'])\n",
" or pd.isna(row['max_advertised_upload_speed'])\n",
" or pd.isna(row['low_latency'])\n",
" ):\n",
" return 0 # If speed values are missing, label as (0) \"Unserved\" \n",
" #------------------------------------------------------------------------------------------------------\n",
" # Check if the location has low latency\n",
" elif row['low_latency'] == 0:\n",
" return 0 # If Low latency is False (0), label as (0) \"Unserved\" \n",
" #------------------------------------------------------------------------------------------------------ \n",
" # Check download and upload speed conditions for \"Unserved\" category\n",
" elif (\n",
" row['max_advertised_download_speed'] < 25\n",
" or row['max_advertised_upload_speed'] < 3\n",
" ):\n",
" return 0 # If speeds are below 25/3, label as (0) \"Unserved\"\n",
" #------------------------------------------------------------------------------------------------------ \n",
" # Check download and upload speed conditions for \"Underserved\" category\n",
" elif (\n",
" 25 <= row['max_advertised_download_speed'] < 100\n",
" or 3 <= row['max_advertised_upload_speed'] < 20\n",
" ):\n",
" return 1 # If speeds are at or above 25/3, but less than 100/20, label as (1) \"Underserved\"\n",
" #------------------------------------------------------------------------------------------------------ \n",
" # Check download and upload speed conditions for \"Served\" category\n",
" elif (\n",
" row['max_advertised_download_speed'] >= 100\n",
" or row['max_advertised_upload_speed'] >= 20\n",
" ):\n",
" return 2 # If speeds are equal to or above 100/20, label as (2) \"Served\"\n",
" #------------------------------------------------------------------------------------------------------\n",
" # with none of the criteria met, label with an (-1) \"Service Level Error\" \n",
" else:\n",
" return -1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Apply the calculate_service_level function to each row in the DataFrame\n",
"# This creates a new column called \"enum_service_level\" \n",
"df['enum_service_level'] = df.apply(calculate_service_level, axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Group by each BSL to obtain the highest reported service level\n",
"# This determines the max service level for each BSL\n",
"#\n",
"# The additional columns below are retained for additional subquery capablities. \n",
"# However, these are not required for the service level analysis and do increase \n",
"# the dataframe memory/size:\n",
"# - Optional Columns: 'block_geoid', 'latitude', 'longitude', 'bsl_flag'\n",
"\n",
"bsl_max_srvc_lvls_df = df.groupby(['location_id', \n",
" 'block_geoid', \n",
" 'latitude', \n",
" 'longitude', \n",
" 'bsl_flag']\n",
" )['enum_service_level'].max().reset_index()\n",
"\n",
"# Map the service level labels for each BSL\n",
"svc_lvl_map = {-1: 'Service Level Error', 0: 'Unserved', 1: 'Underserved', 2: 'Served'}\n",
"\n",
"bsl_max_srvc_lvls_df['service_level'] = bsl_max_srvc_lvls_df['enum_service_level'].map(svc_lvl_map)\n",
"\n",
"# Print a count of max service level records for the state\n",
"f\"Total max service level records: {len(bsl_max_srvc_lvls_df):,}\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### f. Data results and review:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### View sample results and export the complete max service level records for each location:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show the max service levels per location\n",
"# This will show a sample (n) records of the BSL max service levels\n",
"# Location IDs along with their corresponding maximum service levels\n",
"\n",
"# Set the number of rows to display\n",
"row_qty = 10\n",
"\n",
"# Ensure the specified number of rows can be displayed\n",
"if row_qty > pd.get_option('display.max_rows'): \n",
" pd.set_option('display.max_rows', row_qty)\n",
" \n",
"# Display the first (n) records\n",
"bsl_max_srvc_lvls_df.head(row_qty)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Export the service level results for every location, while ensuring\n",
"# that the files are no larger than 1 mil records so that excel can open \n",
"# and display all records. \n",
"\n",
"# Set the base path to include the file name prefix\n",
"base_out_path = os.path.join(results_path, f'{state_fips}_service_level_results')\n",
"\n",
"# get the dataframe row count\n",
"row_cnt = len(bsl_max_srvc_lvls_df)\n",
"\n",
"# Iterate over a chunks of the dataframe and export the results.\n",
"print(f'Exporting results to:')\n",
"for i in range(0, row_cnt, 1000000):\n",
" # Create a slice of the dataframe for each chunk number\n",
" df_chunk = bsl_max_srvc_lvls_df[i:(i + 1000000)]\n",
" # Export the slice to a CSV file.\n",
" file_path = f'{base_out_path}_rows({i+1}_{i+len(df_chunk)}).csv'\n",
" df_chunk.to_csv(file_path, index=False)\n",
" print(f'\\t- {file_path}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### Summarize quanity of locations per service level category:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# Get counts of unique service level values in the service_level column\n",
"val_cnts = (bsl_max_srvc_lvls_df['service_level'].value_counts())\n",
"# Convert the series to a dataframe\n",
"val_cnts_df = pd.DataFrame(val_cnts).reset_index()\n",
"# Set the dataframe column names\n",
"val_cnts_df.columns= ['service_level','location_qty']\n",
"# Show a state level count summary of BSL's per service level \n",
"display(val_cnts_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Export the service level summary table to csv\n",
"file_path = os.path.join(results_path, f'{state_fips}_service_level_summary.csv')\n",
"print(f'Exporting service level summary results to:\\n\\t- {file_path}')\n",
"val_cnts_df.to_csv(file_path, index=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "ArcGISPro",
"language": "Python",
"name": "python3"
},
"language_info": {
"file_extension": ".py",
"name": "python",
"version": "3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}