NTIA Logo

BEAD Service Level Determination Process
¶

Purpose:¶

This notebook provides an overview of the key methodology and criteria used by the National Telecommunications and Information Administration (NTIA) to determine location service levels for the Broadband Equity, Access, and Deployment (BEAD) program. The sample code can be used to replicate this methodology for any given state with minimal resources, and to export the resulting Broadband Serviceable Location (BSL) service level data.

Table of contents¶

  1. Methodology & Documentation
    1. Key Criteria & Constants
      1. Service Levels
        1. How does the BEAD program define an “unserved” location?
        2. How does the BEAD program define an “underserved” location?
        3. Applied Service Level Criteria
        4. Service Level Methodolgies
      2. Location records
        1. Why must the BDC data be joined to the Location Fabric?
        2. Why use the block_geoid column rather than the state column?
        3. What is a Broadband Serviceable Location (BSL)?
  2. Code sample
    1. Analysis overview and initial setup
    2. Notebook Setup
    3. Parameters
    4. Analytic Constants
    5. Data Processing
      1. a. Obtain input file info
      2. b. Read BDC data
      3. c. Read Location Fabric data
      4. d. Location Fabric & BDC Join
      5. e. Location max service level determination
      6. f. Data results and review

1. Methodology & Documentation:¶

A. Key Criteria & Constants:¶

a. Service Levels:¶

i. How does the BEAD program define an “unserved” location?¶
Section I.C.bb. of the NOFO defines unserved locations as locations lacking reliable broadband service or with broadband service offering speeds below 25 megabits per second (Mbps) downstream/3 Mbps upstream at a latency of 100 milliseconds or less. Reliable broadband means broadband service that the Broadband DATA Maps show is accessible to a location via fiber-optic technology; Cable Modem/ Hybrid fiber-coaxial technology; digital subscriber line technology; or terrestrial fixed wireless technology utilizing entirely licensed spectrum or using a hybrid of licensed and unlicensed spectrum. Locations that are served by satellite or purely unlicensed spectrum will also be considered unserved. See BEAD FAQ’s
ii. How does the BEAD program define an “underserved” location?¶
Section I.C.cc. of the NOFO defines underserved locations as locations that are identified as having access to reliable broadband service of at least 25 Mbps downstream/3 Mbps upstream but less than 100 Mbps downstream/20 Mbps upstream at a latency of 100 milliseconds or less. Reliable broadband means broadband service that the Broadband DATA Maps show is accessible to a location via fiber-optic technology; Cable Modem/ Hybrid fiber-coaxial technology; digital subscriber line technology; or terrestrial fixed wireless technology utilizing entirely licensed spectrum or using a hybrid of licensed and unlicensed spectrum. See BEAD FAQ’s
iii. Applied Service Level Criteria:¶

Based on the definition of "Reliable broadband" stated above, NTIA includes technology codes listed below in the analysis of a location's max service level.

BDC codes for "Reliable broadband" deployed technology types:
  • 10 : Copper Wire
  • 40 : Coaxial Cable / HFC
  • 50 : Optical Carrier / Fiber to the Premises
  • 71 : Licensed Terrestrial Fixed Wireless
  • 72 : Licensed-by-Rule Terrestrial Fixed Wireless

Based on the FCC definition of "low latency" in the BDC data specification, NTIA classifies service availability with latency above 100 milliseconds as unserved.

The BDC dataset indicates low latency status with Boolean codes:
  • 0 : False (Not low latency - above 100 milliseconds)
  • 1 : True (low latency - at or less than 100 milliseconds)
Resulting Service Levels Defined:
  • Unserved: Speeds below 25/3 Mbps or NULL OR without low_latency (low_latency=0)
  • Underserved: Speeds at or above 25/3 Mbps, but Below 100/20 Mbps with low_latency (low_latency=1)
  • Served: Service Level at or above 100/20 Mbps with low_latency (low_latency=1)

For more info, see FCC's Data Spec. for BDC Public Data Downloads

iiii. Service Level Methodolgies:¶

DataBricks PySpark/Microsoft SQL Server:

Once the BDC data is filtered by the applied service level criteria and joined to the location fabric, NTIA applies the SQL query below to determine a locations max service level.
-- For each provider reported record, determine the service level (sub-query)
    -- Then select the highest service level for each location 
    SELECT
      location_id,
      -- Get the max service level from the sub-query to determine the max service value, map it to an approperiate string value
      (CASE
        WHEN MAX(service_level) = 0 THEN 'Unserved'     -- if the max value for a location is 0, it is Unserved
        WHEN MAX(service_level) = 1 THEN 'Underserved'  -- if the max value for a location is 1, it is Underserved
        WHEN MAX(service_level) = 2 THEN 'Served'       -- if the max value for a location is 2, it is Served
      END) AS loc_max_service_level
    FROM (
      SELECT
        *,
        (CASE
          -- When a provider level records meets the critera for Unserved, return 0
          -- Unserved: Speeds below 25/3 Mbps or NULL OR without low_latency (low_latency=0)
          WHEN low_latency = 0 THEN 0
          WHEN max_advertised_download_speed IS NULL THEN 0
          WHEN max_advertised_upload_speed IS NULL THEN 0
          WHEN max_advertised_download_speed < 25 THEN 0
          WHEN max_advertised_upload_speed < 3 THEN 0

          -- When a provider level records meets the critera for Underserved, return 1 
          -- Underserved: Speeds at or above 25/3 Mbps, but Below 100/20 Mbps with low_latency (low_latency=1)
          WHEN max_advertised_download_speed BETWEEN 25 AND 99.999 THEN 1
          WHEN max_advertised_upload_speed BETWEEN 3 AND 19.999 THEN 1

          -- When a provider level records meets the critera for Served, return 2
          -- Served: Service Level at or above 100/20 Mbps with low_latency (low_latency=1)
          WHEN max_advertised_download_speed >= 100 THEN 2
          WHEN max_advertised_upload_speed >= 20 THEN 2
        END) AS service_level
      FROM bdc
    ) t1
    -- Group by the location id to get one record per location  
    GROUP BY location_id

Python Pandas:

To obtain the same results using pandas, a defined function named "calculate_service_level" is provided in the notebook. The "calculate_service_level" function wil calculate the service level based on the same service level criteria contained in the SQL sample above.

B. Location records:¶

i. Why must the BDC data be joined to the Location Fabric?¶
The location fabric contains all of the broadband serviceable locations (BSL) whereas, the BDC dataset only contains locations where a provider has reported service availability. Locations in the location fabric without corresponding service availability in BDC dataset are then labeled as "unserved". Without performing this join, these locations will not be counted in the results.
ii. Why use the block_geoid column rather than the state column?¶
While not numerous, some misattribution has occurred in the state column. The block_geoid column was found to be the most reliable column for isolating state records. The first two characters of the block_geoid column represent the state FIPS code.
iii. What is a Broadband Serviceable Location (BSL)?¶
See the FCC Broadband Data Collection Help Center: What is the Location Fabric?

2. Code sample:¶

In the python sample code below, a method to determine service levels using standard python libraries is provided.

A. Analysis overview and initial setup:¶

  1. Obtain the FCC Broadband Data Collection (BDC) and corresponding Location Fabric datasets.
    1. Download and unzip the Location Fabric dataset and take note of the full file path
    2. Downlaod and place the Broadband Data Collection (BDC) zipfiles in a separate directory without any other files. Take note of the full directory path.
      Note: The BDC CSV files should remain zipped for the code sample to work properly and for memory optimization.
       The sample code below will use all of the zipfiles in the directory as data sources.
       Ensure a file for each "reliable broadband" technology type is downloaded. Reference the applied service level criteria for a list of technology types.
  2. Once the files are in place, adjust the analytic variables in the parameters section. Provide/update each path variable.
  3. Run the notebook with the updated variables. Sample results will be displyed in the notebook while full results will be exported to csv files.

B. Notebook Setup:¶

In [1]:
# Import the required python packages
import glob
import pandas as pd
import sys, os
In [2]:
# Display the environment python version
print("python versions: ", sys.version)
python versions:  3.9.16 [MSC v.1931 64 bit (AMD64)]

C. Parameters:* (Requires code modifications) *¶

In [3]:
# Set the directory/folder path that contains only the zipped BDC CSV files (**Required input parameter**)
bdc_dir_path = r"D:\Montana"
In [4]:
# Set the path to the location Fabric FCC "Active" BSL CSV file (**Required input parameter**)
loc_fab_path = r"D:\LocationFabric_123122\FCC_Active_BSL_12312022_V2.1.csv"
In [5]:
# Set the output directory/folder path where the results will be exported (**Required input parameter**)
results_path = r"D:\results"

D. Analytic Constants:¶

Technology codes constants:

See the applied service level criteria for more details.
In [6]:
# Set technology codes to use in the BDC servicel level analyis
# As a standard, NTIA includes technology codes (10, 40, 50, 71, 72) in the analysis of a location's max service level
tech_codes = [10, 40, 50, 71, 72]

E. Data Processing:¶

a. Obtain input file info:¶

In [7]:
# Get a list of all zipped BDC CSV files in the directory
zipped_csv_files = glob.glob(f"{bdc_dir_path}/*.zip")

# Iterate over the list of zipped CSV files to print each path and a file count
for number, zipped_file in enumerate(zipped_csv_files):
    print(number + 1, zipped_file)
1 D:\Montana\bdc_30_Cable_fixed_broadband_123122.zip
2 D:\Montana\bdc_30_Copper_fixed_broadband_123122.zip
3 D:\Montana\bdc_30_Fiber-to-the-Premises_fixed_broadband_123122.zip
4 D:\Montana\bdc_30_GSO-Satellite_fixed_broadband_123122.zip
5 D:\Montana\bdc_30_LBR-Fixed-Wireless_fixed_broadband_123122.zip
6 D:\Montana\bdc_30_Licensed-Fixed-Wireless_fixed_broadband_123122.zip
7 D:\Montana\bdc_30_NGSO-Satellite_fixed_broadband_123122.zip
8 D:\Montana\bdc_30_Unlicensed-Fixed-Wireless_fixed_broadband_123122.zip
In [8]:
# Get the state FIPS code for the BDC files, the FIPS code will be used to query the location fabric
state_fips = zipped_csv_files[0].split('_')[1]

b. Read BDC data:¶

In [9]:
# Create a list of columns to read.  By not reading all available columns, memory usage is optimized
use_columns = [
                'location_id',
                'block_geoid',
                'technology',
                'max_advertised_download_speed',
                'max_advertised_upload_speed',
                'low_latency',
              ]

# Set the BDC columns to treat as integers
int_cols = [
            'technology', 'max_advertised_download_speed', 
            'max_advertised_upload_speed', 'low_latency'
           ]

# Create a list to store DataFrames
dataframes = []

# Iterate over the list of zipped CSV files
for zipped_file in zipped_csv_files:

    # Read the zipped BDC CSV file into a DataFrame, ensure to only read the selected state records
    # All columns will be of type "object" (str) which ensures pandas.concat will not incure datatype errors
    _df = pd.read_csv(zipped_file, 
                      compression="zip", 
                      dtype=str,
                      usecols=use_columns
                     )[lambda df: df['block_geoid'].str.startswith(state_fips.zfill(2))]
   
    # Set columns to treat as integers
    _df[int_cols] = _df[int_cols].astype(int)

    # Filter by specified technology codes 
    _df = _df[_df['technology'].isin(tech_codes)]   
    
    # Add the DataFrame to the list of DataFrames
    if len(_df) > 0:
        dataframes.append(_df)

# Concatenate the list of BDC DataFrames into a single BDC DataFrame
bdc_df = pd.concat(dataframes)

# Clear initial file read dataframes to conserve resources/memeory
for _df in dataframes: 
    del(_df)
del(dataframes)

# Print a count  of BDC records for the state
f"Total State BDC records:  {len(bdc_df):,}"
Out[9]:
'Total State BDC records:  1,139,160'
In [10]:
# Describe the data types for each column in the BDC dataset (optional)
bdc_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1139160 entries, 0 to 430363
Data columns (total 6 columns):
 #   Column                         Non-Null Count    Dtype 
---  ------                         --------------    ----- 
 0   location_id                    1139160 non-null  object
 1   technology                     1139160 non-null  int32 
 2   max_advertised_download_speed  1139160 non-null  int32 
 3   max_advertised_upload_speed    1139160 non-null  int32 
 4   low_latency                    1139160 non-null  int32 
 5   block_geoid                    1139160 non-null  object
dtypes: int32(4), object(2)
memory usage: 43.5+ MB

c. Read Location Fabric data:¶

In [11]:
# Create a list of columns to read.  By not reading all columns, memory is optimized
use_columns = [
                'location_id',
                'block_geoid',
                'latitude',
                'longitude',
                'bsl_flag',
              ]

# Read the un-zipped Location Fabric CSV file into a DataFrame, ensure to only read the selected state records
state_loc_fab_df = pd.read_csv(
                                loc_fab_path, 
                                dtype=str,
                                usecols=use_columns
                              )[lambda df: df['block_geoid'].str.startswith(state_fips.zfill(2))]

# Filter to ensure only BSL's are included in the analysis 
state_loc_fab_df = state_loc_fab_df[state_loc_fab_df['bsl_flag']=='True']   
    
# Print a count  of BSL records for the state
f"Total BSL's:  {len(state_loc_fab_df):,}"
Out[11]:
"Total BSL's:  481,305"
In [12]:
# Describe the data types for each column in the Location Fabric dataset  (optional)
state_loc_fab_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 481305 entries, 66231854 to 73271184
Data columns (total 5 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   location_id  481305 non-null  object
 1   bsl_flag     481305 non-null  object
 2   block_geoid  481305 non-null  object
 3   latitude     481305 non-null  object
 4   longitude    481305 non-null  object
dtypes: object(5)
memory usage: 22.0+ MB

d. Location Fabric & BDC Join:¶

In [13]:
# Some BSL's may not have a service level reported within the BDC data
# Left join the Location Fabric BSL to the BDC dataset, Nulls will be labled as "Unserved"
df = state_loc_fab_df.merge(bdc_df, on='location_id', how='left', suffixes=('', '_df2_'))

# Remove columns duplicated in the join
dup_cols = [c for c in df.columns if c.endswith('_df2_')]
df = df.drop(columns=dup_cols)

# Clear up some memmory by removing the BDC and location fabric dataframes
del(bdc_df)
del(state_loc_fab_df)

# Print a count  of join records for the state
f"Total join records:  {len(df):,}"
Out[13]:
'Total join records:  1,164,160'
In [14]:
# Describe the data types for each column in the joined dataset  (optional)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1164160 entries, 0 to 1164159
Data columns (total 9 columns):
 #   Column                         Non-Null Count    Dtype  
---  ------                         --------------    -----  
 0   location_id                    1164160 non-null  object 
 1   bsl_flag                       1164160 non-null  object 
 2   block_geoid                    1164160 non-null  object 
 3   latitude                       1164160 non-null  object 
 4   longitude                      1164160 non-null  object 
 5   technology                     1139160 non-null  float64
 6   max_advertised_download_speed  1139160 non-null  float64
 7   max_advertised_upload_speed    1139160 non-null  float64
 8   low_latency                    1139160 non-null  float64
dtypes: float64(4), object(5)
memory usage: 88.8+ MB

e. Location max service level determination:¶

i. Methodology references:¶
See the 'Applied Service Level Criteria' and 'Service Level Methodolgies' sections for criteria and methodology info related to the "calculate_service_level" function below.
In [15]:
def calculate_service_level(row):
    """Function calculates the service level based on service level criteria.
    The enumerated service level is calculated for a given row in a pandas dataframe/series.
    An enumerated service level rank value is returned. 
    
    Parameters:
        pd.Series:
            A Pandas row of data containing the columns:
                -'max_advertised_download_speed'
                -'max_advertised_upload_speed'
                -'low_latency'

    Returns:
        int: 
            An enumerated service level calculation results:
                -1: 'Service Level Error'
                 0: 'Unserved' 
                 1: 'Underserved'
                 2: 'Served'

    Example:
        >>> df['enum_service_level'] = df.apply(calculate_service_level, axis=1)
    """
    #------------------------------------------------------------------------------------------------------    
    # Check if download or upload speed values are missing
    if (
        pd.isna(row['max_advertised_download_speed'])
        or pd.isna(row['max_advertised_upload_speed'])
        or pd.isna(row['low_latency'])
    ):
        return 0  # If speed values are missing, label as (0) "Unserved" 
    #------------------------------------------------------------------------------------------------------
    # Check if the location has low latency
    elif row['low_latency'] == 0:
        return 0  # If Low latency is False (0), label as (0) "Unserved"      
    #------------------------------------------------------------------------------------------------------    
    # Check download and upload speed conditions for "Unserved" category
    elif (
        row['max_advertised_download_speed'] < 25
        or row['max_advertised_upload_speed'] < 3
    ):
        return 0  # If speeds are below 25/3, label as (0) "Unserved"
    #------------------------------------------------------------------------------------------------------    
    # Check download and upload speed conditions for "Underserved" category
    elif (
        25 <= row['max_advertised_download_speed'] < 100
        or 3 <= row['max_advertised_upload_speed'] < 20
    ):
        return 1  # If speeds are at or above 25/3, but less than 100/20, label as (1) "Underserved"
    #------------------------------------------------------------------------------------------------------    
    # Check download and upload speed conditions for "Served" category
    elif (
        row['max_advertised_download_speed'] >= 100
        or row['max_advertised_upload_speed'] >= 20
    ):
        return 2  # If speeds are equal to or above 100/20, label as (2) "Served"
    #------------------------------------------------------------------------------------------------------
    # with none of the criteria met, label with an (-1) "Service Level Error" 
    else:
        return -1
In [16]:
# Apply the calculate_service_level function to each row in the DataFrame
# This creates a new column called "enum_service_level" 
df['enum_service_level'] = df.apply(calculate_service_level, axis=1)
In [17]:
# Group by each BSL to obtain the highest reported service level
# This determines the max service level for each BSL
#
# The additional columns below are retained for additional subquery capablities. 
# However, these are not required for the service level analysis and do increase 
# the dataframe memory/size:
#    - Optional Columns: 'block_geoid', 'latitude', 'longitude', 'bsl_flag'

bsl_max_srvc_lvls_df = df.groupby(['location_id', 
                                   'block_geoid', 
                                   'latitude', 
                                   'longitude', 
                                   'bsl_flag']
                                 )['enum_service_level'].max().reset_index()

# Map the service level labels for each BSL
svc_lvl_map = {-1: 'Service Level Error', 0: 'Unserved', 1: 'Underserved', 2: 'Served'}

bsl_max_srvc_lvls_df['service_level'] = bsl_max_srvc_lvls_df['enum_service_level'].map(svc_lvl_map)

# Print a count  of max service level records for the state
f"Total max service level records:  {len(bsl_max_srvc_lvls_df):,}"
Out[17]:
'Total max service level records:  481,305'

f. Data results and review:¶

View sample results and export the complete max service level records for each location:¶
In [18]:
# Show the max service levels per location
# This will show a sample (n) records of the BSL max service levels
# Location IDs along with their corresponding maximum service levels

# Set the number of rows to display
row_qty = 10

# Ensure the specified number of rows can be displayed
if row_qty > pd.get_option('display.max_rows'): 
    pd.set_option('display.max_rows', row_qty)
    
# Display the first (n) records
bsl_max_srvc_lvls_df.head(row_qty)
Out[18]:
location_id block_geoid latitude longitude bsl_flag enum_service_level service_level
0 1010168455 301039635001050 46.21808159 -107.24951265 True 0 Unserved
1 1010168458 301039635001070 46.29335002 -107.23081778 True 2 Served
2 1010168459 301039635001203 46.15996168 -107.45016771 True 2 Served
3 1010168460 301039635001034 46.25113080 -107.34170687 True 0 Unserved
4 1010168461 301039635001203 46.16098078 -107.44901760 True 0 Unserved
5 1010168462 301039635001203 46.16113180 -107.44871257 True 2 Served
6 1010168464 301039635001028 46.29299798 -107.24436713 True 0 Unserved
7 1010168495 301039635001150 46.14980167 -107.08021272 True 0 Unserved
8 1010168496 301039635001044 46.17649833 -107.41926262 True 0 Unserved
9 1010168497 301039635001084 46.29238192 -107.23280198 True 0 Unserved
In [19]:
# Export the service level results for every location, while ensuring
# that the files are no larger than 1 mil records so that excel can open 
# and display all records. 

# Set the base path to include the file name prefix
base_out_path = os.path.join(results_path, f'{state_fips}_service_level_results')

# get the dataframe row count
row_cnt = len(bsl_max_srvc_lvls_df)

# Iterate over a chunks of the dataframe and export the results.
print(f'Exporting results to:')
for i in range(0, row_cnt, 1000000):
    # Create a slice of the dataframe for each chunk number
    df_chunk = bsl_max_srvc_lvls_df[i:(i + 1000000)]
    # Export the slice to a CSV file.
    file_path = f'{base_out_path}_rows({i+1}_{i+len(df_chunk)}).csv'
    df_chunk.to_csv(file_path, index=False)
    print(f'\t- {file_path}')
Exporting results to:
	- D:\results\30_service_level_results_rows(1_481305).csv
Summarize quanity of locations per service level category:¶
In [20]:
# Get counts of unique service level values in the service_level column
val_cnts = (bsl_max_srvc_lvls_df['service_level'].value_counts())
# Convert the series to a dataframe
val_cnts_df = pd.DataFrame(val_cnts).reset_index()
# Set the dataframe column names
val_cnts_df.columns= ['service_level','location_qty']
# Show a state level count summary of BSL's per service level 
display(val_cnts_df)
service_level location_qty
0 Served 327253
1 Unserved 104592
2 Underserved 49460
In [21]:
# Export the service level summary table to csv
file_path = os.path.join(results_path, f'{state_fips}_service_level_summary.csv')
print(f'Exporting service level summary results to:\n\t- {file_path}')
val_cnts_df.to_csv(file_path, index=False)
Exporting service level summary results to:
	- D:\results\30_service_level_summary.csv