Skip to content

Commit

Permalink
Integrate Eia923 Q2 2024 Data (#3768)
Browse files Browse the repository at this point in the history
* Add DOI for new 923

* Update release notes

* Update 923 package data

* Map unmapped plants and utilities

* update minmax row validation test

* Add description of duplicate rows to docs and docstrings

---------

Co-authored-by: Zane Selvans <[email protected]>
  • Loading branch information
aesharpe and zaneselvans authored Aug 12, 2024
1 parent dd30d39 commit 61ec1a7
Show file tree
Hide file tree
Showing 8 changed files with 92 additions and 18 deletions.
2 changes: 2 additions & 0 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ EIA 860
EIA 923
~~~~~~~
* Added EIA 923 early release data from 2023. See :issue:`3719` and PR :pr:`3721`.
* Added EIA 923 monthly data through May as part of the Q2 quarterly release. See
:issue:`3760` and :pr:`3768`.

EPA CEMS
~~~~~~~~
Expand Down
34 changes: 34 additions & 0 deletions docs/templates/eia923_child.rst.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,38 @@ Data Estimates
Plants that did not respond or reported unverified data were recorded as estimates
rolled in with the state/fuel aggregates values reported under the plant id 99999.

Boiler Fuel Primary Keys
------------------------
The :ref:`core_eia923__monthly_boiler_fuel` table has several sneaky primary keys and duplicate rows.
The main primary keys for the table are: ``plant_id_eia, boiler_id, energy_source_code, prime_mover_code,
report_date``. There are some rows that also differ based on ``associated_combined_heat_power``, due
to mid-year retirement of units that are assocated with combine heat and power systems, and
``operator_name``, due to lenient standards for string columns (the all have the same ``operator_id``
value). We drop both the ``associated_combined_heat_power`` and ``operator_name`` fields from the final
normalized table, causing duplicate rows. Luckily, these rows don't provide any conflicting information.
Because they are the same plant, when one row contains an NA value, the other contains a numeric value.
We can easily drop duplicates based on which rows contain NA values with no duplicate value reconciling
necessary.

There are still more duplicate rows with identical qualitative plant information. Luckily, none of these
duplicates contain conflicting information either. All duplicate rows have at least one row containing
solely NA and 0 values.

To address both issues at once, we drop all the duplicate rows with NA or 0 values in the non primary
key columns. One side affect of this is that duplicate rows where both rows contain NA and 0 values will
both get dropped. This leads to gaps in the data where certain months are missing. These values can be
assumed to be 0 or NA.

Boiler Fuel Years
-----------------
The :ref:`core_eia923__monthly_boiler_fuel` table reports all months in a given year, even if there is
no data. At present, we haven't truncated the data after the most recently integrated month, so you will
see all months.

Fluctuations in row count between each quarterly update are therefore due to changes in primary key
quirks as described above.




{%- endblock %}
20 changes: 10 additions & 10 deletions src/pudl/package_data/eia923/file_map.csv

Large diffs are not rendered by default.

Binary file modified src/pudl/package_data/glue/pudl_id_mapping.xlsx
Binary file not shown.
33 changes: 33 additions & 0 deletions src/pudl/package_data/glue/utility_id_pudl.csv
Original file line number Diff line number Diff line change
Expand Up @@ -16342,3 +16342,36 @@ utility_id_pudl,utility_id_ferc1,utility_name_ferc1,utility_id_eia,utility_name_
16384,,,66290,NSF Energy One LLC
16385,,,66291,Portage Solar Plant
16386,,,66292,Desert Willow Energy Storage
16387,,,66317,"Kola Energy Storage, LLC"
16388,,,66336,"Wild Plains Wind Project, LLC"
16389,,,66352,SMT Ironman BESS LLC
16390,,,66345,"Sebree Solar, LLC"
16391,,,66354,"Anole Energy Storage, LLC"
16392,,,66351,Citadel BESS LLC
16393,,,66348,"Silver State South Storage, LLC"
16394,,,65860,"Madison Fields Solar Project, LLC"
16395,,,66346,"Silver Peak Solar, LLC"
16396,,,66314,JGT2 Energy LLC
16397,,,66318,"Zeta Solar, LLC"
16398,,,66334,Twin Lakes Solar LLC
16399,,,66319,"Heartwood Solar, LLC"
16400,,,66360,Reliability Design & Development LLC
16401,,,66320,"White Tail Solar, LLC"
16402,,,66350,Wigeon Whistle BESS LLC
16403,,,66338,Al Pastor BESS LLC
16404,,,66316,"Northumberland Solar I, LLC"
16405,,,66331,"Birch Creek Power, LLC"
16406,,,66347,"Placid Solar II, LLC"
16407,,,66335,REV Renewables LLC
16408,,,66294,"NSF Torrey Site 2, LLC"
16409,,,66300,NY CDG Genesee 1 LLC
16410,,,66321,NY CDG Montgomery 1 LLC
16411,,,66293,"NSF Torrey Site 3, LLC"
16412,,,66295,"NSF Torrey Site 1, LLC"
16413,,,66301,NY CDG Genesee 4 LLC
16414,,,66342,"Catalyze Joliet 1101 Cherry Hill Road Microgrid, LLC"
16415,,,66305,"Rio Vista Executive Boat & RV Storage, LLC"
16416,,,66304,PFMD LL Baltimore LLC
16417,,,66303,PFMD LL Jessup LLC
16418,,,66306,Town Of Cary
16419,,,66343,"Catalyze Rochelle Wiscold Drive Microgrid, LLC"
5 changes: 5 additions & 0 deletions src/pudl/transform/eia923.py
Original file line number Diff line number Diff line change
Expand Up @@ -833,6 +833,11 @@ def _core_eia923__boiler_fuel(raw_eia923__boiler_fuel: pd.DataFrame) -> pd.DataF
* Create a fuel_type_code_pudl field that organizes fuel types into clean,
distinguishable categories.
* Combine year and month columns into a single date column.
* Drop duplicate rows with NA or 0 in all value columns.
Eventually we should truncate this table by the last year-month that was integrated.
Right now all months get integrated for a given year, regardless of whether there's
data for them.
Args:
raw_eia923__boiler_fuel: The raw ``raw_eia923__boiler_fuel`` dataframe.
Expand Down
2 changes: 1 addition & 1 deletion src/pudl/workspace/datastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ class ZenodoDoiSettings(BaseSettings):
eia860: ZenodoDoi = "10.5281/zenodo.11662381"
eia860m: ZenodoDoi = "10.5281/zenodo.11110602"
eia861: ZenodoDoi = "10.5281/zenodo.10204708"
eia923: ZenodoDoi = "10.5281/zenodo.12656894"
eia923: ZenodoDoi = "10.5281/zenodo.12721286"
eia930: ZenodoDoi = "10.5281/zenodo.10840078"
eiawater: ZenodoDoi = "10.5281/zenodo.10806016"
eiaaeo: ZenodoDoi = "10.5281/zenodo.10838488"
Expand Down
14 changes: 7 additions & 7 deletions test/validate/eia_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,17 @@ def test_no_null_cols_eia(pudl_out_eia, live_dbs, cols, df_name):
@pytest.mark.parametrize(
"df_name,raw_rows,monthly_rows,annual_rows",
[
("bf_eia923", 1_642_829, 1_642_829, 135_980),
("bf_eia923", 1_642_806, 1_642_806, 135_980),
("bga_eia860", 153_487, 153_487, 153_487),
("boil_eia860", 89_051, 89_051, 89_051),
("boil_eia860", 89_050, 89_050, 89_050),
("frc_eia923", 673_343, 274_479, 26_709),
("gen_eia923", None, 5_494_932, 459_711),
("gens_eia860", 590_881, 590_881, 590_881),
("gf_eia923", 3_064_042, 3_064_042, 260_842),
("gens_eia860", 591_256, 591_256, 591_256),
("gf_eia923", 3_064_045, 3_064_045, 260_842),
("own_eia860", 95_104, 95_104, 95_104),
("plants_eia860", 215_884, 215_884, 215_884),
("pu_eia860", 214_965, 214_965, 214_965),
("utils_eia860", 147_877, 147_877, 147_877),
("plants_eia860", 216_206, 216_206, 216_206),
("pu_eia860", 215_288, 215_288, 215_288),
("utils_eia860", 147_922, 147_922, 147_922),
("emissions_control_equipment_eia860", 62_102, 62_102, 62_102),
("denorm_emissions_control_equipment_eia860", 62_102, 62_102, 62_102),
("boiler_emissions_control_equipment_assn_eia860", 83_977, 83_977, 83_977),
Expand Down

0 comments on commit 61ec1a7

Please sign in to comment.