Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudNativePG query logs cannot be parsed if the query contains JSON #844

Open
jmealo opened this issue Feb 19, 2025 · 2 comments
Open

CloudNativePG query logs cannot be parsed if the query contains JSON #844

jmealo opened this issue Feb 19, 2025 · 2 comments

Comments

@jmealo
Copy link

jmealo commented Feb 19, 2025

Thank you for adding CNPG support here:
5aa95bc

This appears to be a compounding issue, Loki truncates log lines at 16KB.

With the auto_explain output, we exceed this by quite a bit.

I've observed that the following query cannot be parsed by PGBadger:

WITH hex_cells AS (
            SELECT h3_geography_to_cells_buffered(
                ST_GeomFromGeoJSON($1)::geography,
                $2,
                $3
            ) AS hex_id
        ),
        search_point AS (
            SELECT ST_GeomFromGeoJSON($4)::geography AS geog
        )
        SELECT DISTINCT ON (s.status_id)
               s.status_id,
               s.virtual,
               s.asset_id,
               s.asset_name,
               s.organization_id,
               ash.asset_status,
               ash.status_source AS asset_status_source,
               ash.created AS updated_asset_status,
               ash.expires_at AS asset_status_expires_at,
               s.asset_status_mtbf,
               s.asset_status_mttr,
               psh.power_status,
               psh.status_source AS power_status_source,
               psh.created AS updated_power_status,
               psh.expires_at AS power_status_expires_at,
               s.power_status_mtbf,
               s.power_status_mttr,
               s.created,
               s.deleted,
               ST_AsGeoJSON(l.location) AS geojson,
               l.thoroughfare,
               l.dependent_thoroughfare,
               l.premise,
               l.sub_premise,
               l.dependent_locality,
               l.locality,
               l.administrative_area,
               l.administrative_area_2,
               l.postal_code,
               l.country,
               ST_Distance(l.location, sp.geog) as distance
          FROM assets.statuses AS s
               JOIN assets.status_hexagons AS sh
                    ON s.status_id = sh.status_id
               JOIN locations.locations AS l
                    ON s.location_id = l.location_id
               CROSS JOIN search_point sp
               LEFT JOIN assets.asset_status_history AS ash
                    ON s.status_id = ash.status_id
                    AND s.asset_status_history_last_created = ash.created
               LEFT JOIN assets.power_status_history AS psh
                    ON s.status_id = psh.status_id
                    AND s.power_status_history_last_created = psh.created
               JOIN hex_cells hc
                    ON sh.hex_id = hc.hex_id
         WHERE ST_DWithin(
                   l.location,
                   ST_GeomFromGeoJSON($5)::geography,
                   $6
               )
           AND NOT s.replaced
        
                   AND s.organization_id = $7
            ORDER BY s.status_id, distance

It appears that queries that contain JSON (which at this point, is nested, due to the logging format) are not properly parsed.

@jmealo jmealo changed the title Cloud Native PG JSON query logs cannot be parsed if the query contains JSON CloudNativePG JSON query logs cannot be parsed if the query contains JSON Feb 19, 2025
@jmealo jmealo changed the title CloudNativePG JSON query logs cannot be parsed if the query contains JSON CloudNativePG query logs cannot be parsed if the query contains JSON Feb 19, 2025
@jmealo
Copy link
Author

jmealo commented Feb 19, 2025

I went to fix this and add a new test to the CNPG fixtures and noticed that the log line had been truncated at: 16,385 bytes.

I'm trying to track down when/where this happens. I'm not sure what the correct behavior would be, but the current behavior just shows the first line of the query without any timing details (as it's cut off).

Here's an example of the entire log line:

{"level":"info","ts":"2025-02-18T18:20:23.25169938Z","logger":"postgres","msg":"record","logging_pod":"production-pg-common-13","record":{"log_time":"2025-02-18 18:20:23.250 UTC","user_name":"assets_api","database_name":"gisual","process_id":"2330","connection_from":"10.130.1.29:42296","session_id":"67b4c64f.91a","session_line_num":"576","command_tag":"SELECT","session_start_time":"2025-02-18 17:41:35 UTC","virtual_transaction_id":"78/929","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"duration: 2409.930 ms  plan:\nQuery Text: \n        WITH hex_cells AS (\n            SELECT h3_geography_to_cells_buffered(\n                ST_GeomFromGeoJSON('{\"coordinates\":[-76.21821,36.32361],\"type\":\"Point\"}')::geography,\n                7,\n                10.0\n            ) AS hex_id\n        ),\n        search_point AS (\n            SELECT ST_GeomFromGeoJSON('{\"coordinates\":[-76.21821,36.32361],\"type\":\"Point\"}')::geography AS geog\n        )\n        SELECT DISTINCT ON (s.status_id)\n               s.status_id,\n               s.virtual,\n               s.asset_id,\n               s.asset_name,\n               s.organization_id,\n               ash.asset_status,\n               ash.status_source AS asset_status_source,\n               ash.created AS updated_asset_status,\n               ash.expires_at AS asset_status_expires_at,\n               s.asset_status_mtbf,\n               s.asset_status_mttr,\n               psh.power_status,\n               psh.status_source AS power_status_source,\n               psh.created AS updated_power_status,\n               psh.expires_at AS power_status_expires_at,\n               s.power_status_mtbf,\n               s.power_status_mttr,\n               s.created,\n               s.deleted,\n               ST_AsGeoJSON(l.location) AS geojson,\n               l.thoroughfare,\n               l.dependent_thoroughfare,\n               l.premise,\n               l.sub_premise,\n               l.dependent_locality,\n               l.locality,\n               l.administrative_area,\n               l.administrative_area_2,\n               l.postal_code,\n               l.country,\n               ST_Distance(l.location, sp.geog) as distance\n          FROM assets.statuses AS s\n               JOIN assets.status_hexagons AS sh\n                    ON s.status_id = sh.status_id\n               JOIN locations.locations AS l\n                    ON s.location_id = l.location_id\n               CROSS JOIN search_point sp\n               LEFT JOIN assets.asset_status_history AS ash\n                    ON s.status_id = ash.status_id\n                    AND s.asset_status_history_last_created = ash.created\n               LEFT JOIN assets.power_status_history AS psh\n                    ON s.status_id = psh.status_id\n                    AND s.power_status_history_last_created = psh.created\n               JOIN hex_cells hc\n                    ON sh.hex_id = hc.hex_id\n         WHERE ST_DWithin(\n                   l.location,\n                   ST_GeomFromGeoJSON('{\"coordinates\":[-76.21821,36.32361],\"type\":\"Point\"}')::geography,\n                   10.0\n               )\n           AND NOT s.replaced\n        \n                   AND s.organization_id = '953fd8c4-1b66-438e-b547-4b1d6734ab94'\n            ORDER BY s.status_id, distance\nUnique  (cost=36169.15..36169.16 rows=1 width=251) (actual time=2409.896..2409.904 rows=1 loops=1)\n  Output: s.status_id, s.virtual, s.asset_id, s.asset_name, s.organization_id, ash.asset_status, ash.status_source, ash.created, ash.expires_at, s.asset_status_mtbf, s.asset_status_mttr, psh.power_status, psh.status_source, psh.created, psh.expires_at, s.power_status_mtbf, s.power_status_mttr, s.created, s.deleted, (st_asgeojson(l.location, 9, 0)), l.thoroughfare, l.dependent_thoroughfare, l.premise, l.sub_premise, l.dependent_locality, l.locality, l.administrative_area, l.administrative_area_2, l.postal_code, l.country, (st_distance(l.location, '0101000020E6100000406A1327F70D53C04C546F0D6C294240'::geography, true))\n  Buffers: shared hit=1763\n  ->  Sort  (cost=36169.15..36169.16 rows=1 width=251) (actual time=2409.895..2409.902 rows=1 loops=1)\n        Output: s.status_id, s.virtual, s.asset_id, s.asset_name, s.organization_id, ash.asset_status, ash.status_source, ash.created, ash.expires_at, s.asset_status_mtbf, s.asset_status_mttr, psh.power_status, psh.status_source, psh.created, psh.expires_at, s.power_status_mtbf, s.power_status_mttr, s.created, s.deleted, (st_asgeojson(l.location, 9, 0)), l.thoroughfare, l.dependent_thoroughfare, l.premise, l.sub_premise, l.dependent_locality, l.locality, l.administrative_area, l.administrative_area_2, l.postal_code, l.country, (st_distance(l.location, '0101000020E6100000406A1327F70D53C04C546F0D6C294240'::geography, true))\n        Sort Key: s.status_id, (st_distance(l.location, '0101000020E6100000406A1327F70D53C04C546F0D6C294240'::geography, true))\n        Sort Method: quicksort  Memory: 25kB\n        Buffers: shared hit=1763\n        ->  Nested Loop Left Join  (cost=19.21..36169.14 rows=1 width=251) (actual time=2409.882..2409.895 rows=1 loops=1)\n              Output: s.status_id, s.virtual, s.asset_id, s.asset_name, s.organization_id, ash.asset_status, ash.status_source, ash.created, ash.expires_at, s.asset_status_mtbf, s.asset_status_mttr, psh.power_status, psh.status_source, psh.created, psh.expires_at, s.power_status_mtbf, s.power_status_mttr, s.created, s.deleted, st_asgeojson(l.location, 9, 0), l.thoroughfare, l.dependent_thoroughfare, l.premise, l.sub_premise, l.dependent_locality, l.locality, l.administrative_area, l.administrative_area_2, l.postal_code, l.country, st_distance(l.location, '0101000020E6100000406A1327F70D53C04C546F0D6C294240'::geography, true)\n              Inner Unique: true\n              Buffers: shared hit=1763\n              ->  Nested Loop Left Join  (cost=18.79..36152.12 rows=1 width=243) (actual time=2409.857..2409.868 rows=1 loops=1)\n                    Output: s.status_id, s.virtual, s.asset_id, s.asset_name, s.organization_id, s.asset_status_mtbf, s.asset_status_mttr, s.power_status_mtbf, s.power_status_mttr, s.created, s.deleted, s.power_status_history_last_created, l.location, l.thoroughfare, l.dependent_thoroughfare, l.premise, l.sub_premise, l.dependent_locality, l.locality, l.administrative_area, l.administrative_area_2, l.postal_code, l.country, ash.asset_status, ash.status_source, ash.created, ash.expires_at\n                    Inner Unique: true\n                    Buffers: shared hit=1759\n                    ->  Hash Join  (cost=18.37..36148.35 rows=1 width=227) (actual time=2409.821..2409.830 rows=1 loops=1)\n                          Output: s.status_id, s.virtual, s.asset_id, s.asset_name, s.organization_id, s.asset_status_mtbf, s.asset_status_mttr, s.power_status_mtbf, s.power_status_mttr, s.created, s.deleted, s.asset_status_history_last_created, s.power_status_history_last_created, l.location, l.thoroughfare, l.dependent_thoroughfare, l.premise, l.sub_premise, l.dependent_locality, l.locality, l.administrative_area, l.administrative_area_2, l.postal_code, l.country\n                          Inner Unique: true\n                          Hash Cond: (s.location_id = l.location_id)\n                          Buffers: shared hit=1755\n                          ->  Nested Loop  (cost=0.85..36120.95 rows=3764 width=126) (actual time=0.726..0.733 rows=1 loops=1)\n                                Output: s.status_id, s.virtual, s.asset_id, s.asset_name, s.organization_id, s.asset_status_mtbf, s.asset_status_mttr, s.power_status_mtbf, s.power_status_mttr, s.created, s.deleted, s.location_id, s.asset_status_history_last_created, s.power_status_history_last_created\n                                Inner Unique: true\n                                Buffers: shared hit=23\n                                ->  Nested Loop  (cost=0.43..13856.97 rows=46946 width=8) (actual time=0.681..0.695 rows=4 loops=1)\n                                      Output: sh.status_id\n                                      Buffers: shared hit=7\n                                      ->  ProjectSet  (cost=0.00..5.27 rows=1000 width=8) (actual time=0.662..0.664 rows=1 loops=1)\n                                            Output: h3_geography_to_cells_buffered('0101000020E6100000406A1327F70D53C04C546F0D6C294240'::geography, 7, '10'::double precision)\n                                            ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)\n                                      ->  Index Scan using statuses_hex_id_idx on assets.status_hexagons sh  (cost=0.43..13.38 rows=47 width=16) (actual time=0.017..0.026 rows=4 loops=1)\n                                            Output: sh.status_id, sh.hex_id\n                                            Index Cond: (sh.hex_id = (h3_geography_to_cells_buffered('0101000020E6100000406A1327F70D53C04C546F0D6C294240'::geography, 7, '10'::double precision)))\n                                            Buffers: shared hit=7\n                                ->  Index Scan using statuses_pkey on assets.statuses s  (cost=0.43..0.47 rows=1 width=126) (actual time=0.008..0.008 rows=0 loops=4)\n                                      Output: s.status_id, s.asset_id, s.virtual, s.replaced, s.asset_name, s.location_id, s.organization_id, s.asset_status_history_last_created, s.asset_status_history_last_non_unknown_created, s.asset_status_down_count, s.asset_status_repair_count, s.asset_status_mtbf, s.asset_status_mttr, s.asset_status_last_down, s.asset_status_calculated, s.power_status_history_last_created, s.power_status_history_last_non_unknown_created, s.power_status_down_count, s.power_status_repair_count, s.power_status_mtbf, s.power_status_mttr, s.power_status_last_down, s.power_status_calculated, s.created, s.deleted\n                                      Index Cond: (s.status_id = sh.status_id)\n                                      Filter: ((NOT s.replaced) AND (s.organization_id = '953fd8c4-1b66-438e-b547-4b1d6734ab94'::uuid))\n                                      Rows Removed by Filter: 1\n                                      Buffers: shared hit=16\n                          ->  Hash  (cost=15.26..15.26 rows=181 width=117) (actual time=2409.086..2409.087 rows=276 loops=1)\n                                Output: l.location, l.thoroughfare, l.dependent_thoroughfare, l.premise, l.sub_premise, l.dependent_locality, l.locality, l.administrative_area, l.administrative_area_2, l.postal_code, l.country, l.location_id\n                                Buckets: 1024  Batches: 1  Memory Usage: 73kB\n                                Buffers: shared hit=1732\n                                ->  Index Scan using locations_location_idx on locations.locations l  (cost=0.54..15.26 rows=181 width=117) (actual time=0.178..2408.843 rows=276 loops=1)\n                                      Output: l.location, l.thoroughfare, l.dependent_thoroughfare, l.premise, l.sub_premise, l.dependent_locality, l.locality, l.administrative_area, l.administrative_area_2, l.postal_code, l.country, l.location_id\n                                      Index Cond: (l.location && _st_expand('0101000020E6100000406A1327F70D53C04C546F0D6C294240'::geography, '10'::double precision))\n                                      Filter: st_dwithin(l.location, '0101000020E6100000406A1327F70D53C04C546F0D6C294240'::geography, '10'::double precision, true)\n                                      Rows Removed by Filter: 353\n                                      Buffers: shared hit=1732\n                    ->  Append  (cost=0.42..3.69 rows=8 width=32) (actual time=0.030..0.031 rows=1 loops=1)\n                          Buffers: shared hit=4\n                          ->  Index Scan using asset_status_history_p0_pkey on assets.asset_status_history_p0 ash_1  (cost=0.42..0.46 rows=1 width=32) (actual time=0.025..0.025 rows=1 loops=1)\n                                Output: ash_1.asset_status, ash_1.status_source, ash_1.created, ash_1.expires_at, ash_1.status_id\n                                Index Cond: ((ash_1.status_id = s.status_id) AND (ash_1.created = s.asset_status_history_last_created))\n                                Buffers: shared hit=4\n                          ->  Index Scan using asset_status_history_p1_pkey on assets.asset_status_history_p1 ash_2  (cost=0.42..0.46 rows=1 width=32) (never executed)\n                                Output: ash_2.asset_status, ash_2.status_source, ash_2.created, ash_2.expires_at, ash_2.status_id\n                                Index Cond: ((ash_2.status_id = s.status_id) AND (ash_2.created = s.asset_status_history_last_created))\n                          ->  Index Scan using asset_status_history_p2_pkey on assets.asset_status_history_p2 ash_3  (cost=0.42..0.46 rows=1 width=32) (never executed)\n                                Output: ash_3.asset_status, ash_3.status_source, ash_3.created, ash_3.expires_at, ash_3.status_id\n                                Index Cond: ((ash_3.status_id = s.status_id) AND (ash_3.created = s.asset_status_history_last_created))\n                          ->  Index Scan using asset_status_history_p3_pkey on assets.asset_status_history_p3 ash_4  (cost=0.42..0.46 rows=1 width=32) (never executed)\n                                Output: ash_4.asset_status, ash_4.status_source, ash_4.created, ash_4.expires_at, ash_4.status_id\n                                Index Cond: ((ash_4.status_id = s.status_id) AND (ash_4.created = s.asset_status_history_last_created))\n                          ->  Index Scan using asset_status_history_p4_pkey on assets.asset_status_history_p4 ash_5  (cost=0.42..0.46 rows=1 width=32) (never executed)\n                                Output: ash_5.asset_status, ash_5.status_source, ash_5.created, ash_5.expires_at, ash_5.status_id\n                                Index Cond: ((ash_5.status_id = s.status_id) AND (ash_5.created = s.asset_status_history_last_created))\n                          ->  Index Scan using asset_status_history_p5_pkey on assets.asset_status_history_p5 ash_6  (cost=0.42..0.46 rows=1 width=32) (never executed)\n                                Output: ash_6.asset_status, ash_6.status_source, ash_6.created, ash_6.expires_at, ash_6.status_id\n                                Index Cond: ((ash_6.status_id = s.status_id) AND (ash_6.created = s.asset_status_history_last_created))\n                          ->  Index Scan using asset_status_history_p6_pkey on assets.asset_status_history_p6 ash_7  (cost=0.42..0.46 rows=1 width=32) (never executed)\n                                Output: ash_7.asset_status, ash_7.status_source, ash_7.created, ash_7.expires_at, ash_7.status_id\n                                Index Cond: ((ash_7.status_id = s.status_id) AND (ash_7.created = s.asset_status_history_last_created))\n                          ->  Index Scan using asset_status_history_p7_pkey on assets.asset_status_history_p7 ash_8  (cost=0.42..0.46 rows=1 width=32) (never executed)\n                                Output: ash_8.asset_status, ash_8.status_source, ash_8.created, ash_8.expires_at, ash_8.status_id\n                                Index Cond: ((ash_8.status_id = s.status_id) AND (ash_8.created = s.asset_status_history_last_created))\n              ->  Append  (cost=0.42..3.82 rows=8 width=32) (actual time=0.015..0.016 rows=1 loops=1)\n                    Buffers: shared hit=4\n                    ->  Index Scan using power_status_history_p0_pkey on assets.power_status_history_p0 psh_1  (cost=0.42..0.47 rows=1 width=32) (actual time=0.012..0.012 rows=1 loops=1)\n                          Output: psh_1.power_status, psh_1.status_source, psh_1.created, psh_1.expires_at, psh_1.status_id\n                          Index Cond: ((psh_1.status_id = s.status_id) AND (psh_1.created = s.power_status_history_last_created))\n                          Buffers: shared hit=4\n                    ->  Index Scan using power_status_history_p1_pkey on assets.power_status_history_p1 psh_2  (cost=0.42..0.47 rows=1 width=32) (never executed)\n                          Output: psh_2.power_status, psh_2.status_source, psh_2.created, psh_2.expires_at, psh_2.status_id\n                          Index Cond: (

@jmealo jmealo closed this as completed Feb 19, 2025
@jmealo
Copy link
Author

jmealo commented Feb 19, 2025

TLDR: Loki truncates log lines. If you're going to use it with PgBadger, you'll want to increase the:
max_line_length config option.

@jmealo jmealo reopened this Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant