NTIA Logo

BEAD Service Level Determination Process
¶

Purpose:¶

This notebook provides an overview of the key methodology and criteria used by the National Telecommunications and Information Administration (NTIA) to determine location service levels for the Broadband Equity, Access, and Deployment (BEAD) program. The sample code can be used to replicate this methodology for any given state with minimal resources, and to export the resulting Broadband Serviceable Location (BSL) service level data.

Table of contents¶

  1. Methodology & Documentation
    1. Key Criteria & Constants
      1. Service Levels
        1. How does the BEAD program define an “unserved” location?
        2. How does the BEAD program define an “underserved” location?
        3. Applied Service Level Criteria
        4. Service Level Methodolgies
      2. Location records
        1. Why must the BDC data be joined to the Location Fabric?
        2. Why use the block_geoid column rather than the state column?
        3. What is a Broadband Serviceable Location (BSL)?
  2. Code sample
    1. Analysis overview and initial setup
    2. Notebook Setup
    3. Parameters
    4. Analytic Constants
    5. Data Processing
      1. a. Obtain input file info
      2. b. Read BDC data
      3. c. Read Location Fabric data
      4. d. Location Fabric & BDC Join
      5. e. Location max service level determination
      6. f. Data results and review

1. Methodology & Documentation:¶

A. Key Criteria & Constants:¶

a. Service Levels:¶

i. How does the BEAD program define an “unserved” location?¶
Section I.C.bb. of the NOFO defines unserved locations as locations lacking reliable broadband service or with broadband service offering speeds below 25 megabits per second (Mbps) downstream/3 Mbps upstream at a latency of 100 milliseconds or less. Reliable broadband means broadband service that the Broadband DATA Maps show is accessible to a location via fiber-optic technology; Cable Modem/ Hybrid fiber-coaxial technology; digital subscriber line technology; or terrestrial fixed wireless technology utilizing entirely licensed spectrum or using a hybrid of licensed and unlicensed spectrum. Locations that are served by satellite or purely unlicensed spectrum will also be considered unserved. See BEAD FAQ’s
ii. How does the BEAD program define an “underserved” location?¶
Section I.C.cc. of the NOFO defines underserved locations as locations that are identified as having access to reliable broadband service of at least 25 Mbps downstream/3 Mbps upstream but less than 100 Mbps downstream/20 Mbps upstream at a latency of 100 milliseconds or less. Reliable broadband means broadband service that the Broadband DATA Maps show is accessible to a location via fiber-optic technology; Cable Modem/ Hybrid fiber-coaxial technology; digital subscriber line technology; or terrestrial fixed wireless technology utilizing entirely licensed spectrum or using a hybrid of licensed and unlicensed spectrum. See BEAD FAQ’s
iii. Applied Service Level Criteria:¶

Based on the definition of "Reliable broadband" stated above, NTIA includes technology codes listed below in the analysis of a location's max service level.

BDC codes for "Reliable broadband" deployed technology types:
  • 10 : Copper Wire
  • 40 : Coaxial Cable / HFC
  • 50 : Optical Carrier / Fiber to the Premises
  • 71 : Licensed Terrestrial Fixed Wireless
  • 72 : Licensed-by-Rule Terrestrial Fixed Wireless

Based on the FCC definition of "low latency" in the BDC data specification, NTIA classifies service availability with latency above 100 milliseconds as unserved.

The BDC dataset indicates low latency status with Boolean codes:
  • 0 : False (Not low latency - above 100 milliseconds)
  • 1 : True (low latency - at or less than 100 milliseconds)
Resulting Service Levels Defined:
  • Unserved: Speeds below 25/3 Mbps or NULL OR without low_latency (low_latency=0)
  • Underserved: Speeds at or above 25/3 Mbps, but Below 100/20 Mbps with low_latency (low_latency=1)
  • Served: Service Level at or above 100/20 Mbps with low_latency (low_latency=1)

For more info, see FCC's Data Spec. for BDC Public Data Downloads

iiii. Service Level Methodolgies:¶

DataBricks PySpark/Microsoft SQL Server:

Once the BDC data is filtered by the applied service level criteria and joined to the location fabric, NTIA applies the SQL query below to determine a locations max service level.
-- For each provider reported record, determine the service level (sub-query)
    -- Then select the highest service level for each location 
    SELECT
      location_id,
      -- Get the max service level from the sub-query to determine the max service value, map it to an approperiate string value
      (CASE
        WHEN MAX(service_level) = 0 THEN 'Unserved'     -- if the max value for a location is 0, it is Unserved
        WHEN MAX(service_level) = 1 THEN 'Underserved'  -- if the max value for a location is 1, it is Underserved
        WHEN MAX(service_level) = 2 THEN 'Served'       -- if the max value for a location is 2, it is Served
      END) AS loc_max_service_level
    FROM (
      SELECT
        *,
        (CASE
          -- When a provider level records meets the critera for Unserved, return 0
          -- Unserved: Speeds below 25/3 Mbps or NULL OR without low_latency (low_latency=0)
          WHEN low_latency = 0 THEN 0
          WHEN max_advertised_download_speed IS NULL THEN 0
          WHEN max_advertised_upload_speed IS NULL THEN 0
          WHEN max_advertised_download_speed < 25 THEN 0
          WHEN max_advertised_upload_speed < 3 THEN 0

          -- When a provider level records meets the critera for Underserved, return 1 
          -- Underserved: Speeds at or above 25/3 Mbps, but Below 100/20 Mbps with low_latency (low_latency=1)
          WHEN max_advertised_download_speed BETWEEN 25 AND 99.999 THEN 1
          WHEN max_advertised_upload_speed BETWEEN 3 AND 19.999 THEN 1

          -- When a provider level records meets the critera for Served, return 2
          -- Served: Service Level at or above 100/20 Mbps with low_latency (low_latency=1)
          WHEN max_advertised_download_speed >= 100 THEN 2
          WHEN max_advertised_upload_speed >= 20 THEN 2
        END) AS service_level
      FROM bdc
    ) t1
    -- Group by the location id to get one record per location  
    GROUP BY location_id

Python Pandas:

To obtain the same results using pandas, a defined function named "calculate_service_level" is provided in the notebook. The "calculate_service_level" function wil calculate the service level based on the same service level criteria contained in the SQL sample above.

B. Location records:¶

i. Why must the BDC data be joined to the Location Fabric?¶
The location fabric contains all of the broadband serviceable locations (BSL) whereas, the BDC dataset only contains locations where a provider has reported service availability. Locations in the location fabric without corresponding service availability in BDC dataset are then labeled as "unserved". Without performing this join, these locations will not be counted in the results.
ii. Why use the block_geoid column rather than the state column?¶
While not numerous, some misattribution has occurred in the state column. The block_geoid column was found to be the most reliable column for isolating state records. The first two characters of the block_geoid column represent the state FIPS code.
iii. What is a Broadband Serviceable Location (BSL)?¶
See the FCC Broadband Data Collection Help Center: What is the Location Fabric?

2. Code sample:¶

In the python sample code below, a method to determine service levels using standard python libraries is provided.

A. Analysis overview and initial setup:¶

  1. Obtain the FCC Broadband Data Collection (BDC) and corresponding Location Fabric datasets.
    1. Download and unzip the Location Fabric dataset and take note of the full file path
    2. Downlaod and place the Broadband Data Collection (BDC) zipfiles in a separate directory without any other files. Take note of the full directory path.
      Note: The BDC CSV files should remain zipped for the code sample to work properly and for memory optimization.
       The sample code below will use all of the zipfiles in the directory as data sources.
       Ensure a file for each "reliable broadband" technology type is downloaded. Reference the applied service level criteria for a list of technology types.
  2. Once the files are in place, adjust the analytic variables in the parameters section. Provide/update each path variable.
  3. Run the notebook with the updated variables. Sample results will be displyed in the notebook while full results will be exported to csv files.

B. Notebook Setup:¶

In [22]:
# Import the required python packages
import glob
import pandas as pd
import sys, os
In [23]:
# Display the environment python version
print("python versions: ", sys.version)
python versions:  3.9.16 [MSC v.1931 64 bit (AMD64)]

C. Parameters:* (Requires code modifications) *¶

In [24]:
# Set the directory/folder path that contains only the zipped BDC CSV files (**Required input parameter**)
bdc_dir_path = r"D:\Pennsylvania"
In [25]:
# Set the path to the location Fabric FCC "Active" BSL CSV file (**Required input parameter**)
loc_fab_path = r"D:\LocationFabric_123122\FCC_Active_BSL_12312022_V2.1.csv"
In [26]:
# Set the output directory/folder path where the results will be exported (**Required input parameter**)
results_path = r"D:\results"

D. Analytic Constants:¶

Technology codes constants:

See the applied service level criteria for more details.
In [27]:
# Set technology codes to use in the BDC servicel level analyis
# As a standard, NTIA includes technology codes (10, 40, 50, 71, 72) in the analysis of a location's max service level
tech_codes = [10, 40, 50, 71, 72]

E. Data Processing:¶

a. Obtain input file info:¶

In [28]:
# Get a list of all zipped BDC CSV files in the directory
zipped_csv_files = glob.glob(f"{bdc_dir_path}/*.zip")

# Iterate over the list of zipped CSV files to print each path and a file count
for number, zipped_file in enumerate(zipped_csv_files):
    print(number + 1, zipped_file)
1 D:\Pennsylvania\bdc_42_Cable_fixed_broadband.csv.zip
2 D:\Pennsylvania\bdc_42_Copper_fixed_broadband.csv.zip
3 D:\Pennsylvania\bdc_42_Fiber-to-the-Premises_fixed_broadband.csv.zip
4 D:\Pennsylvania\bdc_42_LBR-Fixed-Wireless_fixed_broadband.csv.zip
5 D:\Pennsylvania\bdc_42_Licensed-Fixed-Wireless_fixed_broadband.csv.zip
6 D:\Pennsylvania\bdc_42_Unlicensed-Fixed-Wireless_fixed_broadband.csv.zip
In [29]:
# Get the state FIPS code for the BDC files, the FIPS code will be used to query the location fabric
state_fips = zipped_csv_files[0].split('_')[1]

b. Read BDC data:¶

In [30]:
# Create a list of columns to read.  By not reading all available columns, memory usage is optimized
use_columns = [
                'location_id',
                'block_geoid',
                'technology',
                'max_advertised_download_speed',
                'max_advertised_upload_speed',
                'low_latency',
              ]

# Set the BDC columns to treat as integers
int_cols = [
            'technology', 'max_advertised_download_speed', 
            'max_advertised_upload_speed', 'low_latency'
           ]

# Create a list to store DataFrames
dataframes = []

# Iterate over the list of zipped CSV files
for zipped_file in zipped_csv_files:

    # Read the zipped BDC CSV file into a DataFrame, ensure to only read the selected state records
    # All columns will be of type "object" (str) which ensures pandas.concat will not incure datatype errors
    _df = pd.read_csv(zipped_file, 
                      compression="zip", 
                      dtype=str,
                      usecols=use_columns
                     )[lambda df: df['block_geoid'].str.startswith(state_fips.zfill(2))]
   
    # Set columns to treat as integers
    _df[int_cols] = _df[int_cols].astype(int)

    # Filter by specified technology codes 
    _df = _df[_df['technology'].isin(tech_codes)]   
    
    # Add the DataFrame to the list of DataFrames
    if len(_df) > 0:
        dataframes.append(_df)

# Concatenate the list of BDC DataFrames into a single BDC DataFrame
bdc_df = pd.concat(dataframes)

# Clear initial file read dataframes to conserve resources/memeory
for _df in dataframes: 
    del(_df)
del(dataframes)

# Print a count  of BDC records for the state
f"Total State BDC records:  {len(bdc_df):,}"
Out[30]:
'Total State BDC records:  14,675,200'
In [31]:
# Describe the data types for each column in the BDC dataset (optional)
bdc_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 14675200 entries, 0 to 4773081
Data columns (total 6 columns):
 #   Column                         Dtype 
---  ------                         ----- 
 0   location_id                    object
 1   technology                     int32 
 2   max_advertised_download_speed  int32 
 3   max_advertised_upload_speed    int32 
 4   low_latency                    int32 
 5   block_geoid                    object
dtypes: int32(4), object(2)
memory usage: 559.8+ MB

c. Read Location Fabric data:¶

In [32]:
# Create a list of columns to read.  By not reading all columns, memory is optimized
use_columns = [
                'location_id',
                'block_geoid',
                'latitude',
                'longitude',
                'bsl_flag',
              ]

# Read the un-zipped Location Fabric CSV file into a DataFrame, ensure to only read the selected state records
state_loc_fab_df = pd.read_csv(
                                loc_fab_path, 
                                dtype=str,
                                usecols=use_columns
                              )[lambda df: df['block_geoid'].str.startswith(state_fips.zfill(2))]

# Filter to ensure only BSL's are included in the analysis 
state_loc_fab_df = state_loc_fab_df[state_loc_fab_df['bsl_flag']=='True']   
    
# Print a count  of BSL records for the state
f"Total BSL's:  {len(state_loc_fab_df):,}"
Out[32]:
"Total BSL's:  4,871,282"
In [33]:
# Describe the data types for each column in the Location Fabric dataset  (optional)
state_loc_fab_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4871282 entries, 83486504 to 95108482
Data columns (total 5 columns):
 #   Column       Dtype 
---  ------       ----- 
 0   location_id  object
 1   bsl_flag     object
 2   block_geoid  object
 3   latitude     object
 4   longitude    object
dtypes: object(5)
memory usage: 223.0+ MB

d. Location Fabric & BDC Join:¶

In [34]:
# Some BSL's may not have a service level reported within the BDC data
# Left join the Location Fabric BSL to the BDC dataset, Nulls will be labled as "Unserved"
df = state_loc_fab_df.merge(bdc_df, on='location_id', how='left', suffixes=('', '_df2_'))

# Remove columns duplicated in the join
dup_cols = [c for c in df.columns if c.endswith('_df2_')]
df = df.drop(columns=dup_cols)

# Clear up some memmory by removing the BDC and location fabric dataframes
del(bdc_df)
del(state_loc_fab_df)

# Print a count  of join records for the state
f"Total join records:  {len(df):,}"
Out[34]:
'Total join records:  14,714,265'
In [35]:
# Describe the data types for each column in the joined dataset  (optional)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 14714265 entries, 0 to 14714264
Data columns (total 9 columns):
 #   Column                         Dtype  
---  ------                         -----  
 0   location_id                    object 
 1   bsl_flag                       object 
 2   block_geoid                    object 
 3   latitude                       object 
 4   longitude                      object 
 5   technology                     float64
 6   max_advertised_download_speed  float64
 7   max_advertised_upload_speed    float64
 8   low_latency                    float64
dtypes: float64(4), object(5)
memory usage: 1.1+ GB

e. Location max service level determination:¶

i. Methodology references:¶
See the 'Applied Service Level Criteria' and 'Service Level Methodolgies' sections for criteria and methodology info related to the "calculate_service_level" function below.
In [36]:
def calculate_service_level(row):
    """Function calculates the service level based on service level criteria.
    The enumerated service level is calculated for a given row in a pandas dataframe/series.
    An enumerated service level rank value is returned. 
    
    Parameters:
        pd.Series:
            A Pandas row of data containing the columns:
                -'max_advertised_download_speed'
                -'max_advertised_upload_speed'
                -'low_latency'

    Returns:
        int: 
            An enumerated service level calculation results:
                -1: 'Service Level Error'
                 0: 'Unserved' 
                 1: 'Underserved'
                 2: 'Served'

    Example:
        >>> df['enum_service_level'] = df.apply(calculate_service_level, axis=1)
    """
    #------------------------------------------------------------------------------------------------------    
    # Check if download or upload speed values are missing
    if (
        pd.isna(row['max_advertised_download_speed'])
        or pd.isna(row['max_advertised_upload_speed'])
        or pd.isna(row['low_latency'])
    ):
        return 0  # If speed values are missing, label as (0) "Unserved" 
    #------------------------------------------------------------------------------------------------------
    # Check if the location has low latency
    elif row['low_latency'] == 0:
        return 0  # If Low latency is False (0), label as (0) "Unserved"      
    #------------------------------------------------------------------------------------------------------    
    # Check download and upload speed conditions for "Unserved" category
    elif (
        row['max_advertised_download_speed'] < 25
        or row['max_advertised_upload_speed'] < 3
    ):
        return 0  # If speeds are below 25/3, label as (0) "Unserved"
    #------------------------------------------------------------------------------------------------------    
    # Check download and upload speed conditions for "Underserved" category
    elif (
        25 <= row['max_advertised_download_speed'] < 100
        or 3 <= row['max_advertised_upload_speed'] < 20
    ):
        return 1  # If speeds are at or above 25/3, but less than 100/20, label as (1) "Underserved"
    #------------------------------------------------------------------------------------------------------    
    # Check download and upload speed conditions for "Served" category
    elif (
        row['max_advertised_download_speed'] >= 100
        or row['max_advertised_upload_speed'] >= 20
    ):
        return 2  # If speeds are equal to or above 100/20, label as (2) "Served"
    #------------------------------------------------------------------------------------------------------
    # with none of the criteria met, label with an (-1) "Service Level Error" 
    else:
        return -1
In [37]:
# Apply the calculate_service_level function to each row in the DataFrame
# This creates a new column called "enum_service_level" 
df['enum_service_level'] = df.apply(calculate_service_level, axis=1)
In [38]:
# Group by each BSL to obtain the highest reported service level
# This determines the max service level for each BSL
#
# The additional columns below are retained for additional subquery capablities. 
# However, these are not required for the service level analysis and do increase 
# the dataframe memory/size:
#    - Optional Columns: 'block_geoid', 'latitude', 'longitude', 'bsl_flag'

bsl_max_srvc_lvls_df = df.groupby(['location_id', 
                                   'block_geoid', 
                                   'latitude', 
                                   'longitude', 
                                   'bsl_flag']
                                 )['enum_service_level'].max().reset_index()

# Map the service level labels for each BSL
svc_lvl_map = {-1: 'Service Level Error', 0: 'Unserved', 1: 'Underserved', 2: 'Served'}

bsl_max_srvc_lvls_df['service_level'] = bsl_max_srvc_lvls_df['enum_service_level'].map(svc_lvl_map)

# Print a count  of max service level records for the state
f"Total max service level records:  {len(bsl_max_srvc_lvls_df):,}"
Out[38]:
'Total max service level records:  4,871,282'

f. Data results and review:¶

View sample results and export the complete max service level records for each location:¶
In [39]:
# Show the max service levels per location
# This will show a sample (n) records of the BSL max service levels
# Location IDs along with their corresponding maximum service levels

# Set the number of rows to display
row_qty = 10

# Ensure the specified number of rows can be displayed
if row_qty > pd.get_option('display.max_rows'): 
    pd.set_option('display.max_rows', row_qty)
    
# Display the first (n) records
bsl_max_srvc_lvls_df.head(row_qty)
Out[39]:
location_id block_geoid latitude longitude bsl_flag enum_service_level service_level
0 1013445211 420535302001138 41.47657134 -79.51189088 True 2 Served
1 1013445212 420535302001108 41.48116880 -79.44348404 True 0 Unserved
2 1013445213 420535301002109 41.33163685 -79.16130683 True 0 Unserved
3 1013445215 420535301002109 41.33240893 -79.14868456 True 0 Unserved
4 1013445216 420535301002109 41.32939262 -79.15527322 True 0 Unserved
5 1013445217 420535301002109 41.33501029 -79.15880658 True 0 Unserved
6 1013445218 420535301002109 41.33102189 -79.16010171 True 0 Unserved
7 1013445219 420535301002109 41.33150784 -79.16050175 True 0 Unserved
8 1013445220 420535301002109 41.33245293 -79.16186288 True 0 Unserved
9 1013445221 420535301002109 41.32961665 -79.15392809 True 0 Unserved
In [40]:
# Export the service level results for every location, while ensuring
# that the files are no larger than 1 mil records so that excel can open 
# and display all records. 

# Set the base path to include the file name prefix
base_out_path = os.path.join(results_path, f'{state_fips}_service_level_results')

# get the dataframe row count
row_cnt = len(bsl_max_srvc_lvls_df)

# Iterate over a chunks of the dataframe and export the results.
print(f'Exporting results to:')
for i in range(0, row_cnt, 1000000):
    # Create a slice of the dataframe for each chunk number
    df_chunk = bsl_max_srvc_lvls_df[i:(i + 1000000)]
    # Export the slice to a CSV file.
    file_path = f'{base_out_path}_rows({i+1}_{i+len(df_chunk)}).csv'
    df_chunk.to_csv(file_path, index=False)
    print(f'\t- {file_path}')
Exporting results to:
	- D:\results\42_service_level_results_rows(1_1000000).csv
	- D:\results\42_service_level_results_rows(1000001_2000000).csv
	- D:\results\42_service_level_results_rows(2000001_3000000).csv
	- D:\results\42_service_level_results_rows(3000001_4000000).csv
	- D:\results\42_service_level_results_rows(4000001_4871282).csv
Summarize quanity of locations per service level category:¶
In [41]:
# Get counts of unique service level values in the service_level column
val_cnts = (bsl_max_srvc_lvls_df['service_level'].value_counts())
# Convert the series to a dataframe
val_cnts_df = pd.DataFrame(val_cnts).reset_index()
# Set the dataframe column names
val_cnts_df.columns= ['service_level','location_qty']
# Show a state level count summary of BSL's per service level 
display(val_cnts_df)
service_level location_qty
0 Served 4541619
1 Unserved 278512
2 Underserved 51151
In [42]:
# Export the service level summary table to csv
file_path = os.path.join(results_path, f'{state_fips}_service_level_summary.csv')
print(f'Exporting service level summary results to:\n\t- {file_path}')
val_cnts_df.to_csv(file_path, index=False)
Exporting service level summary results to:
	- D:\results\42_service_level_summary.csv