{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://www.ntia.gov/themes/custom/ntia_uswds//img/NTIAlogo-official.svg\" alt=\"NTIA Logo\" width=\"100em\">\n",
    "\n",
    "# <div id='top' style=\"font-size:1.6em;color:#2F5496;text-align: center;\" ><em>BEAD Service Level Determination Process</em></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <em><span id='purpose' style=\"margin-left: 1em; font-size:1.5em;color:#2F5496;\">Purpose:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\" margin-left: 2em; font-size:1em;color:#404040;\">This notebook provides an overview of the key methodology and criteria used by the National Telecommunications and Information Administration (NTIA) to determine location service levels for the Broadband Equity, Access, and Deployment (BEAD) program. The sample code can be used to replicate this methodology for any given state with minimal resources, and to export the resulting <a href=\"https://help.bdc.fcc.gov/hc/en-us/articles/16842264428059-About-the-Fabric-What-a-Broadband-Serviceable-Location-BSL-Is-and-Is-Not\">Broadband Serviceable Location (BSL)</a> service level data.</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <em><span id='toc' style=\"margin-left: 1em; font-size:1.5em;color:#2F5496;\">Table of contents</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"margin-left: 2em; font-size:1.1em;\">\n",
    "    <ol>\n",
    "        <li><a href=\"#sec_1\">Methodology & Documentation</a></li>   \n",
    "            <ol>\n",
    "                <li><a href=\"#sec_1_a\">Key Criteria & Constants</a></li>   \n",
    "                    <ol>\n",
    "                        <li><a href=\"#sec_1_a_1\">Service Levels</a></li>   \n",
    "                            <ol>\n",
    "                                <li><a href=\"#sec_1_a_1_1\">How does the BEAD program define an “unserved” location?</a></li>  \n",
    "                                <li><a href=\"#sec_1_a_1_2\">How does the BEAD program define an “underserved” location?</a></li> \n",
    "                                <li><a href=\"#sec_1_a_1_3\">Applied Service Level Criteria</a></li>\n",
    "                                <li><a href=\"#sec_1_a_1_4\">Service Level Methodolgies</a></li>\n",
    "                            </ol>  \n",
    "                        <li><a href=\"#sec_1_b\">Location records</a></li>\n",
    "                            <ol>\n",
    "                                <li><a href=\"#sec_1_b_1\">Why must the BDC data be joined to the Location Fabric?</a></li>  \n",
    "                                <li><a href=\"#sec_1_b_2\">Why use the block_geoid column rather than the state column?</a></li>\n",
    "                                <li><a href=\"#sec_1_b_3\">What is a Broadband Serviceable Location (BSL)?</a></li>\n",
    "                            </ol>                \n",
    "                    </ol>\n",
    "            </ol> \n",
    "        <li><a href=\"#sec_2\">Code sample</a></li>\n",
    "            <ol>\n",
    "                <li><a href=\"#sec_2_a\">Analysis overview and initial setup</a></li>\n",
    "                <li><a href=\"#sec_2_b\">Notebook Setup</a></li>      \n",
    "                <li><a href=\"#sec_2_c\">Parameters</a></li>\n",
    "                <li><a href=\"#sec_2_d\">Analytic Constants</a></li> \n",
    "                <li><a href=\"#sec_2_e\">Data Processing</a></li>\n",
    "                <ol>\n",
    "                    <li><a href='#sec_2_e_1'>a. Obtain input file info</a></li>\n",
    "                    <li><a href='#sec_2_e_2'>b. Read BDC data</a></li>\n",
    "                    <li><a href='#sec_2_e_3'>c. Read Location Fabric data</a></li>\n",
    "                    <li><a href='#sec_2_e_4'>d. Location Fabric & BDC Join</a></li>\n",
    "                    <li><a href='#sec_2_e_5'>e. Location max service level determination</a></li>\n",
    "                    <li><a href='#sec_2_e_6'>f. Data results and review</a></li>\n",
    "                </ol>\n",
    "            </ol>\n",
    "  </ol>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <em><span  id='sec_1' style=\" margin-left: 1em; font-size:1.5em;color:#2F5496;\" >1. Methodology & Documentation:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <em><span  id='sec_1_a' style=\"margin-left: 2em; font-size:1.5em ;color:#2F5496;\" >A. Key Criteria & Constants:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### <em><span  id='sec_1_a_1' style=\"margin-left: 4em; font-size:1.4em ;color:#2F5496;\" >a. Service Levels:</span></em> "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### <em><span  id='sec_1_a_1_1' style=\"margin-left: 6em; font-size:1.2em ;color:#2F5496;\" >i. How does the BEAD program define an “unserved” location?</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\" margin-left: 9em; font-size:1em;color:#303030;\">\n",
    "Section I.C.bb. of the NOFO defines unserved locations as locations lacking reliable broadband service or with broadband service offering speeds below 25 megabits per second (Mbps) downstream/3 Mbps upstream at a latency of 100 milliseconds or less. Reliable broadband means broadband service that the Broadband DATA Maps show is accessible to a location via fiber-optic technology; Cable Modem/ Hybrid fiber-coaxial technology; digital subscriber line technology; or terrestrial fixed wireless technology utilizing entirely licensed spectrum or using a hybrid of licensed and unlicensed spectrum. Locations that are served by satellite or purely unlicensed spectrum will also be considered unserved. <a href=https://broadbandusa.ntia.gov/sites/default/files/2022-06/BEAD-FAQs.pdf>See BEAD FAQ’s</a>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### <em><span  id='sec_1_a_1_2' style=\"margin-left: 6em; font-size:1.2em ;color:#2F5496;\" >ii. How does the BEAD program define an “underserved” location?</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\" margin-left: 9em; font-size:1em;color:#303030;\">\n",
    "Section I.C.cc. of the NOFO defines underserved locations as locations that are identified as having access to reliable broadband service of at least 25 Mbps downstream/3 Mbps upstream but less than 100 Mbps downstream/20 Mbps upstream at a latency of 100 milliseconds or less. Reliable broadband means broadband service that the Broadband 5 DATA Maps show is accessible to a location via fiber-optic technology; Cable Modem/ Hybrid fiber-coaxial technology; digital subscriber line technology; or terrestrial fixed wireless technology utilizing entirely licensed spectrum or using a hybrid of licensed and unlicensed spectrum. \n",
    "<a href=https://broadbandusa.ntia.gov/sites/default/files/2022-06/BEAD-FAQs.pdf>See BEAD FAQ’s</a>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### <em><span  id='sec_1_a_1_3' style=\"margin-left: 6em; font-size:1.2em ;color:#2F5496;\" >iii. Applied Service Level Criteria:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"margin-left: 9em; font-size:1em ;color:#303030;\" >\n",
    "\n",
    "<b><span style=\"font-size:1.1em ;color:#404040;\">Based on the definition of \"Reliable broadband\" stated above, NTIA includes technology codes listed below in the analysis of a location's max service level.</span></b>\n",
    "\n",
    "<div style=\"margin-left: 4em\">\n",
    "    <b><em>BDC codes for \"Reliable broadband\" deployed technology types:</em></b>\n",
    "    <ul style=\"margin-left: 1em\">\n",
    "    <li>10 : Copper Wire</li>   \n",
    "    <li>40 : Coaxial Cable / HFC</li>   \n",
    "    <li>50 : Optical Carrier / Fiber to the Premises</li>   \n",
    "    <li>71 : Licensed Terrestrial Fixed Wireless</li>     \n",
    "    <li>72 : Licensed-by-Rule Terrestrial Fixed Wireless</li>   \n",
    "    </ul>    \n",
    "</div>\n",
    "\n",
    "\n",
    "<b><span style=\"font-size:1.1em ;color:#404040;\">Based on the FCC definition of \"low latency\" in the <a href=https://us-fcc.app.box.com/v/bdc-data-downloads-output>BDC data specification</a>, NTIA classifies service availability with latency above 100 milliseconds as unserved.</span></b>\n",
    "\n",
    "\n",
    "<div style=\"margin-left: 4em\">\n",
    "    <b><em>The BDC dataset indicates low latency status with Boolean codes:</em></b>\n",
    "    <ul style=\"margin-left: 0.9em\">\n",
    "    <li>0 : False (Not low latency - above 100 milliseconds)</li>   \n",
    "    <li>1 : True (low latency - at or less than 100 milliseconds)</li>   \n",
    "    </ul>    \n",
    "</div>\n",
    "\n",
    "\n",
    "<div style=\"margin-left: 4em\">\n",
    "    <b><em>Resulting Service Levels Defined:</em></b>\n",
    "    <ul style=\"margin-left: 0.9em\">\n",
    "    <li>Unserved: Speeds below 25/3 Mbps or NULL OR without low_latency (low_latency=0)</li>   \n",
    "    <li>Underserved: Speeds at or above 25/3 Mbps, but Below 100/20 Mbps with low_latency (low_latency=1)</li>   \n",
    "    <li>Served: Service Level at or above 100/20 Mbps with low_latency (low_latency=1)</li>   \n",
    "    </ul>    \n",
    "</div>\n",
    "</div>\n",
    "\n",
    "<a style=\"margin-left: 10em;\" href=https://us-fcc.app.box.com/v/bdc-data-downloads-output>**For more info, see FCC's Data Spec. for BDC Public Data Downloads**</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### <em><span  id='sec_1_a_1_4' style=\"margin-left: 6em; font-size:1.2em ;color:#2F5496;\" >iiii. Service Level Methodolgies:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"margin-left: 9em; font-size:1em ;color:#303030;\" >\n",
    "\n",
    "<b><span style=\"font-size:1.1em ;color:#404040;\">DataBricks PySpark/Microsoft SQL Server:</span></b>\n",
    "\n",
    "<div style=\"margin-left: 2em;\">Once the BDC data is filtered by the <a href=\"#sec_1_a_1_3\">applied service level criteria</a> and joined to the location fabric, NTIA applies the SQL query below to determine a locations max service level. </div> \n",
    "\n",
    "<div style=\"margin-left: 2em; font-size:0.75em\" >\n",
    "    \n",
    "~~~sql\n",
    "    -- For each provider reported record, determine the service level (sub-query)\n",
    "    -- Then select the highest service level for each location \n",
    "    SELECT\n",
    "      location_id,\n",
    "      -- Get the max service level from the sub-query to determine the max service value, map it to an approperiate string value\n",
    "      (CASE\n",
    "        WHEN MAX(service_level) = 0 THEN 'Unserved'     -- if the max value for a location is 0, it is Unserved\n",
    "        WHEN MAX(service_level) = 1 THEN 'Underserved'  -- if the max value for a location is 1, it is Underserved\n",
    "        WHEN MAX(service_level) = 2 THEN 'Served'       -- if the max value for a location is 2, it is Served\n",
    "      END) AS loc_max_service_level\n",
    "    FROM (\n",
    "      SELECT\n",
    "        *,\n",
    "        (CASE\n",
    "          -- When a provider level records meets the critera for Unserved, return 0\n",
    "          -- Unserved: Speeds below 25/3 Mbps or NULL OR without low_latency (low_latency=0)\n",
    "          WHEN low_latency = 0 THEN 0\n",
    "          WHEN max_advertised_download_speed IS NULL THEN 0\n",
    "          WHEN max_advertised_upload_speed IS NULL THEN 0\n",
    "          WHEN max_advertised_download_speed < 25 THEN 0\n",
    "          WHEN max_advertised_upload_speed < 3 THEN 0\n",
    "          \n",
    "          -- When a provider level records meets the critera for Underserved, return 1 \n",
    "          -- Underserved: Speeds at or above 25/3 Mbps, but Below 100/20 Mbps with low_latency (low_latency=1)\n",
    "          WHEN max_advertised_download_speed BETWEEN 25 AND 99.999 THEN 1\n",
    "          WHEN max_advertised_upload_speed BETWEEN 3 AND 19.999 THEN 1\n",
    "          \n",
    "          -- When a provider level records meets the critera for Served, return 2\n",
    "          -- Served: Service Level at or above 100/20 Mbps with low_latency (low_latency=1)\n",
    "          WHEN max_advertised_download_speed >= 100 THEN 2\n",
    "          WHEN max_advertised_upload_speed >= 20 THEN 2\n",
    "        END) AS service_level\n",
    "      FROM bdc\n",
    "    ) t1\n",
    "    -- Group by the location id to get one record per location  \n",
    "    GROUP BY location_id\n",
    "~~~\n",
    "</div>    \n",
    "\n",
    "<b><span style=\"font-size:1.1em ;color:#404040;\">Python Pandas:</span></b>\n",
    "\n",
    "\n",
    "<div style=\"margin-left: 2em;\">To obtain the same results using pandas, a defined function named \"<a href=\"#sec_2_e_5\">calculate_service_level</a>\" is provided in the notebook.  The \"<a href=\"#sec_2_e_5\">calculate_service_level</a>\" function wil calculate the service level based on the same service level criteria contained in the SQL sample above.\n",
    "</div>  \n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <em><span  id='sec_1_b' style=\"margin-left: 2em; font-size:1.5em ;color:#2F5496;\" >B. Location records:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### <em><span  id='sec_1_b_1' style=\"margin-left: 6em; font-size:1.2em ;color:#2F5496;\" >i. Why must the BDC data be joined to the Location Fabric?</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\" margin-left: 9em; font-size:1em;color:#303030;\">\n",
    "The location fabric contains all of the broadband serviceable locations (BSL) whereas, the BDC dataset only contains locations where a provider has reported service availability.\n",
    "Locations in the location fabric without corresponding service availability in BDC dataset are then labeled as \"unserved\". Without performing this join, these locations will not be counted in the results.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### <em><span  id='sec_1_b_2' style=\"margin-left: 6em; font-size:1.2em ;color:#2F5496;\" >ii. Why use the block_geoid column rather than the state column?</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\" margin-left: 9em; font-size:1em;color:#303030;\">\n",
    "    While not numerous, some misattribution has occurred in the state column. The block_geoid column was found to be the most reliable column for isolating state records. \n",
    "    The first two characters of the block_geoid column represent the state FIPS code. \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### <em><span  id='sec_1_b_3' style=\"margin-left: 6em; font-size:1.2em ;color:#2F5496;\" >iii. What is a Broadband Serviceable Location (BSL)?</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\" margin-left: 9em; font-size:1em;color:#303030;\">\n",
    "    See the <b><a href=\"https://help.bdc.fcc.gov/hc/en-us/articles/16842264428059-About-the-Fabric-What-a-Broadband-Serviceable-Location-BSL-Is-and-Is-Not\">FCC Broadband Data Collection Help Center: What is the Location Fabric?</a></b>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <em><span  id='sec_2' style=\" margin-left: 1em; font-size:1.5em;color:#2F5496;\" >2. Code sample:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\" margin-left: 4em; font-size:1.1em;color:#303030;\">In the python sample code below, a method to determine service levels using standard python libraries is provided. \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <em><span  id='sec_2_a' style=\"margin-left: 2em; font-size:1.5em ;color:#2F5496;\" >A. Analysis overview and initial setup:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"margin-left: 4em; font-size:1.1em;\">\n",
    "    <ol>\n",
    "        <li>Obtain the FCC Broadband Data Collection (BDC) and corresponding Location Fabric datasets.</li>\n",
    "        <ol>\n",
    "            <li>Download and unzip the <a href=\"https://help.bdc.fcc.gov/hc/en-us/articles/6785010654235-How-Government-Entities-Can-Access-the-Production-Location-Fabric-\">Location Fabric dataset</a> and take note of the full file path</li> \n",
    "            <li>Downlaod and place the <a href=\"https://broadbandmap.fcc.gov/data-download\">Broadband Data Collection (BDC)</a> zipfiles in a separate directory without any other files. Take note of the full directory path.\n",
    "            <div><b>Note: The BDC CSV files should remain zipped for the code sample to work properly and for memory optimization.</b></b>\n",
    "                <br>&emsp;The sample code below will use all of the zipfiles in the directory as data sources.\n",
    "                <br>&emsp;Ensure a file for each \"reliable broadband\" technology type is downloaded.  Reference the <a href=\"#sec_1_a_1_3\">applied service level criteria</a> for a list of technology types.\n",
    "\t\t\t</div></li>\n",
    "        </ol>\n",
    "        <li>Once the files are in place, adjust the analytic variables in the <a href=\"#sec_2_c\">parameters</a> section.  Provide/update each path variable.</li>\n",
    "        <li>Run the notebook with the updated variables.  Sample results will be displyed in the notebook while full results will be exported to csv files.</li>    \n",
    "    </ol>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <em><span  id='sec_2_b' style=\"margin-left: 2em; font-size:1.5em ;color:#2F5496;\" >B. Notebook Setup:</span></em>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the required python packages\n",
    "import glob\n",
    "import pandas as pd\n",
    "import sys, os"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display the environment python version\n",
    "print(\"python versions: \", sys.version)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='parameters'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <em><span  id='sec_2_c' style=\"margin-left: 2em; font-size:1.5em ;color:#2F5496;\" >C. Parameters:</span><span style=\"margin-left: 3em; font-size:1.4em ;color:Red;\" >* (Requires code modifications) *</span></em>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the directory/folder path that contains only the zipped BDC CSV files (**Required input parameter**)\n",
    "bdc_dir_path = r\"D:\\Pennsylvania\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the path to the location Fabric FCC \"Active\" BSL CSV file (**Required input parameter**)\n",
    "loc_fab_path = r\"D:\\LocationFabric_123122\\FCC_Active_BSL_12312022_V2.1.csv\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the output directory/folder path where the results will be exported (**Required input parameter**)\n",
    "results_path = r\"D:\\results\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <em><span  id='sec_2_d' style=\"margin-left: 2em; font-size:1.5em ;color:#2F5496;\" >D. Analytic Constants:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b><span style=\"margin-left: 4em; font-size:1.1em ;color:#404040;\">Technology codes constants:</span></b>\n",
    "\n",
    "<div style=\"margin-left: 6em; font-size:1em ;color:#303030;\" >\n",
    "See the <a href=\"#sec_1_a_1_3\">applied service level criteria</a> for more details. \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set technology codes to use in the BDC servicel level analyis\n",
    "# As a standard, NTIA includes technology codes (10, 40, 50, 71, 72) in the analysis of a location's max service level\n",
    "tech_codes = [10, 40, 50, 71, 72]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <em><span  id='sec_2_e' style=\"margin-left: 2em; font-size:1.5em ;color:#2F5496;\" >E. Data Processing:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### <em><span  id='sec_2_e_1' style=\"margin-left: 4em; font-size:1.4em ;color:#2F5496;\" >a. Obtain input file info:</span></em> "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Get a list of all zipped BDC CSV files in the directory\n",
    "zipped_csv_files = glob.glob(f\"{bdc_dir_path}/*.zip\")\n",
    "\n",
    "# Iterate over the list of zipped CSV files to print each path and a file count\n",
    "for number, zipped_file in enumerate(zipped_csv_files):\n",
    "    print(number + 1, zipped_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the state FIPS code for the BDC files, the FIPS code will be used to query the location fabric\n",
    "state_fips = zipped_csv_files[0].split('_')[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### <em><span  id='sec_2_e_2' style=\"margin-left: 4em; font-size:1.4em ;color:#2F5496;\" >b. Read BDC data:</span></em> "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a list of columns to read.  By not reading all available columns, memory usage is optimized\n",
    "use_columns = [\n",
    "                'location_id',\n",
    "                'block_geoid',\n",
    "                'technology',\n",
    "                'max_advertised_download_speed',\n",
    "                'max_advertised_upload_speed',\n",
    "                'low_latency',\n",
    "              ]\n",
    "\n",
    "# Set the BDC columns to treat as integers\n",
    "int_cols = [\n",
    "            'technology', 'max_advertised_download_speed', \n",
    "            'max_advertised_upload_speed', 'low_latency'\n",
    "           ]\n",
    "\n",
    "# Create a list to store DataFrames\n",
    "dataframes = []\n",
    "\n",
    "# Iterate over the list of zipped CSV files\n",
    "for zipped_file in zipped_csv_files:\n",
    "\n",
    "    # Read the zipped BDC CSV file into a DataFrame, ensure to only read the selected state records\n",
    "    # All columns will be of type \"object\" (str) which ensures pandas.concat will not incure datatype errors\n",
    "    _df = pd.read_csv(zipped_file, \n",
    "                      compression=\"zip\", \n",
    "                      dtype=str,\n",
    "                      usecols=use_columns\n",
    "                     )[lambda df: df['block_geoid'].str.startswith(state_fips.zfill(2))]\n",
    "   \n",
    "    # Set columns to treat as integers\n",
    "    _df[int_cols] = _df[int_cols].astype(int)\n",
    "\n",
    "    # Filter by specified technology codes \n",
    "    _df = _df[_df['technology'].isin(tech_codes)]   \n",
    "    \n",
    "    # Add the DataFrame to the list of DataFrames\n",
    "    if len(_df) > 0:\n",
    "        dataframes.append(_df)\n",
    "\n",
    "# Concatenate the list of BDC DataFrames into a single BDC DataFrame\n",
    "bdc_df = pd.concat(dataframes)\n",
    "\n",
    "# Clear initial file read dataframes to conserve resources/memeory\n",
    "for _df in dataframes: \n",
    "    del(_df)\n",
    "del(dataframes)\n",
    "\n",
    "# Print a count  of BDC records for the state\n",
    "f\"Total State BDC records:  {len(bdc_df):,}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Describe the data types for each column in the BDC dataset (optional)\n",
    "bdc_df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### <em><span  id='sec_2_e_3' style=\"margin-left: 4em; font-size:1.4em ;color:#2F5496;\" >c. Read Location Fabric data:</span></em> "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a list of columns to read.  By not reading all columns, memory is optimized\n",
    "use_columns = [\n",
    "                'location_id',\n",
    "                'block_geoid',\n",
    "                'latitude',\n",
    "                'longitude',\n",
    "                'bsl_flag',\n",
    "              ]\n",
    "\n",
    "# Read the un-zipped Location Fabric CSV file into a DataFrame, ensure to only read the selected state records\n",
    "state_loc_fab_df = pd.read_csv(\n",
    "                                loc_fab_path, \n",
    "                                dtype=str,\n",
    "                                usecols=use_columns\n",
    "                              )[lambda df: df['block_geoid'].str.startswith(state_fips.zfill(2))]\n",
    "\n",
    "# Filter to ensure only BSL's are included in the analysis \n",
    "state_loc_fab_df = state_loc_fab_df[state_loc_fab_df['bsl_flag']=='True']   \n",
    "    \n",
    "# Print a count  of BSL records for the state\n",
    "f\"Total BSL's:  {len(state_loc_fab_df):,}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Describe the data types for each column in the Location Fabric dataset  (optional)\n",
    "state_loc_fab_df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### <em><span  id='sec_2_e_4' style=\"margin-left: 4em; font-size:1.4em ;color:#2F5496;\" >d. Location Fabric & BDC Join:</span></em> "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Some BSL's may not have a service level reported within the BDC data\n",
    "# Left join the Location Fabric BSL to the BDC dataset, Nulls will be labled as \"Unserved\"\n",
    "df = state_loc_fab_df.merge(bdc_df, on='location_id', how='left', suffixes=('', '_df2_'))\n",
    "\n",
    "# Remove columns duplicated in the join\n",
    "dup_cols = [c for c in df.columns if c.endswith('_df2_')]\n",
    "df = df.drop(columns=dup_cols)\n",
    "\n",
    "# Clear up some memmory by removing the BDC and location fabric dataframes\n",
    "del(bdc_df)\n",
    "del(state_loc_fab_df)\n",
    "\n",
    "# Print a count  of join records for the state\n",
    "f\"Total join records:  {len(df):,}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Describe the data types for each column in the joined dataset  (optional)\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### <em><span  id='sec_2_e_5' style=\"margin-left: 2em; font-size:1.4em ;color:#2F5496;\" >e. Location max service level determination:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### <em><span  id='sec_calculate_service_level' style=\"margin-left: 4em; font-size:1.2em ;color:#2F5496;\" >i. Methodology references:</span></em>\n",
    "<div style=\" margin-left: 8em; font-size:1em;color:#303030;\">\n",
    "<b>See the <a href=\"#sec_1_a_1_3\">'Applied Service Level Criteria'</a> and <a href=\"#sec_1_a_1_4\">'Service Level Methodolgies'</a> sections for criteria and methodology info related to the <em>\"calculate_service_level\"</em> function below.</b>    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def calculate_service_level(row):\n",
    "    \"\"\"Function calculates the service level based on service level criteria.\n",
    "    The enumerated service level is calculated for a given row in a pandas dataframe/series.\n",
    "    An enumerated service level rank value is returned. \n",
    "    \n",
    "    Parameters:\n",
    "        pd.Series:\n",
    "            A Pandas row of data containing the columns:\n",
    "                -'max_advertised_download_speed'\n",
    "                -'max_advertised_upload_speed'\n",
    "                -'low_latency'\n",
    "\n",
    "    Returns:\n",
    "        int: \n",
    "            An enumerated service level calculation results:\n",
    "                -1: 'Service Level Error'\n",
    "                 0: 'Unserved' \n",
    "                 1: 'Underserved'\n",
    "                 2: 'Served'\n",
    "\n",
    "    Example:\n",
    "        >>> df['enum_service_level'] = df.apply(calculate_service_level, axis=1)\n",
    "    \"\"\"\n",
    "    #------------------------------------------------------------------------------------------------------    \n",
    "    # Check if download or upload speed values are missing\n",
    "    if (\n",
    "        pd.isna(row['max_advertised_download_speed'])\n",
    "        or pd.isna(row['max_advertised_upload_speed'])\n",
    "        or pd.isna(row['low_latency'])\n",
    "    ):\n",
    "        return 0  # If speed values are missing, label as (0) \"Unserved\" \n",
    "    #------------------------------------------------------------------------------------------------------\n",
    "    # Check if the location has low latency\n",
    "    elif row['low_latency'] == 0:\n",
    "        return 0  # If Low latency is False (0), label as (0) \"Unserved\"      \n",
    "    #------------------------------------------------------------------------------------------------------    \n",
    "    # Check download and upload speed conditions for \"Unserved\" category\n",
    "    elif (\n",
    "        row['max_advertised_download_speed'] < 25\n",
    "        or row['max_advertised_upload_speed'] < 3\n",
    "    ):\n",
    "        return 0  # If speeds are below 25/3, label as (0) \"Unserved\"\n",
    "    #------------------------------------------------------------------------------------------------------    \n",
    "    # Check download and upload speed conditions for \"Underserved\" category\n",
    "    elif (\n",
    "        25 <= row['max_advertised_download_speed'] < 100\n",
    "        or 3 <= row['max_advertised_upload_speed'] < 20\n",
    "    ):\n",
    "        return 1  # If speeds are at or above 25/3, but less than 100/20, label as (1) \"Underserved\"\n",
    "    #------------------------------------------------------------------------------------------------------    \n",
    "    # Check download and upload speed conditions for \"Served\" category\n",
    "    elif (\n",
    "        row['max_advertised_download_speed'] >= 100\n",
    "        or row['max_advertised_upload_speed'] >= 20\n",
    "    ):\n",
    "        return 2  # If speeds are equal to or above 100/20, label as (2) \"Served\"\n",
    "    #------------------------------------------------------------------------------------------------------\n",
    "    # with none of the criteria met, label with an (-1) \"Service Level Error\" \n",
    "    else:\n",
    "        return -1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply the calculate_service_level function to each row in the DataFrame\n",
    "# This creates a new column called \"enum_service_level\" \n",
    "df['enum_service_level'] = df.apply(calculate_service_level, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Group by each BSL to obtain the highest reported service level\n",
    "# This determines the max service level for each BSL\n",
    "#\n",
    "# The additional columns below are retained for additional subquery capablities. \n",
    "# However, these are not required for the service level analysis and do increase \n",
    "# the dataframe memory/size:\n",
    "#    - Optional Columns: 'block_geoid', 'latitude', 'longitude', 'bsl_flag'\n",
    "\n",
    "bsl_max_srvc_lvls_df = df.groupby(['location_id', \n",
    "                                   'block_geoid', \n",
    "                                   'latitude', \n",
    "                                   'longitude', \n",
    "                                   'bsl_flag']\n",
    "                                 )['enum_service_level'].max().reset_index()\n",
    "\n",
    "# Map the service level labels for each BSL\n",
    "svc_lvl_map = {-1: 'Service Level Error', 0: 'Unserved', 1: 'Underserved', 2: 'Served'}\n",
    "\n",
    "bsl_max_srvc_lvls_df['service_level'] = bsl_max_srvc_lvls_df['enum_service_level'].map(svc_lvl_map)\n",
    "\n",
    "# Print a count  of max service level records for the state\n",
    "f\"Total max service level records:  {len(bsl_max_srvc_lvls_df):,}\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### <em><span  id='sec_2_e_6' style=\"margin-left: 2em; font-size:1.4em ;color:#2F5496;\" >f. Data results and review:</span></em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### <b><span style=\"font-size:1.1em ;color:#404040;\">View sample results and export the complete max service level records for each location:</span></b>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show the max service levels per location\n",
    "# This will show a sample (n) records of the BSL max service levels\n",
    "# Location IDs along with their corresponding maximum service levels\n",
    "\n",
    "# Set the number of rows to display\n",
    "row_qty = 10\n",
    "\n",
    "# Ensure the specified number of rows can be displayed\n",
    "if row_qty > pd.get_option('display.max_rows'): \n",
    "    pd.set_option('display.max_rows', row_qty)\n",
    "    \n",
    "# Display the first (n) records\n",
    "bsl_max_srvc_lvls_df.head(row_qty)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export the service level results for every location, while ensuring\n",
    "# that the files are no larger than 1 mil records so that excel can open \n",
    "# and display all records. \n",
    "\n",
    "# Set the base path to include the file name prefix\n",
    "base_out_path = os.path.join(results_path, f'{state_fips}_service_level_results')\n",
    "\n",
    "# get the dataframe row count\n",
    "row_cnt = len(bsl_max_srvc_lvls_df)\n",
    "\n",
    "# Iterate over a chunks of the dataframe and export the results.\n",
    "print(f'Exporting results to:')\n",
    "for i in range(0, row_cnt, 1000000):\n",
    "    # Create a slice of the dataframe for each chunk number\n",
    "    df_chunk = bsl_max_srvc_lvls_df[i:(i + 1000000)]\n",
    "    # Export the slice to a CSV file.\n",
    "    file_path = f'{base_out_path}_rows({i+1}_{i+len(df_chunk)}).csv'\n",
    "    df_chunk.to_csv(file_path, index=False)\n",
    "    print(f'\\t- {file_path}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### <b><span style=\"font-size:1.1em ;color:#404040;\">Summarize quanity of locations per service level category:</span></b>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Get counts of unique service level values in the service_level column\n",
    "val_cnts = (bsl_max_srvc_lvls_df['service_level'].value_counts())\n",
    "# Convert the series to a dataframe\n",
    "val_cnts_df = pd.DataFrame(val_cnts).reset_index()\n",
    "# Set the dataframe column names\n",
    "val_cnts_df.columns= ['service_level','location_qty']\n",
    "# Show a state level count summary of BSL's per service level \n",
    "display(val_cnts_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export the service level summary table to csv\n",
    "file_path = os.path.join(results_path, f'{state_fips}_service_level_summary.csv')\n",
    "print(f'Exporting service level summary results to:\\n\\t- {file_path}')\n",
    "val_cnts_df.to_csv(file_path, index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ArcGISPro",
   "language": "Python",
   "name": "python3"
  },
  "language_info": {
   "file_extension": ".py",
   "name": "python",
   "version": "3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}