Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues on final product [EMODNETBIO-181] #1

Open
salvafern opened this issue Aug 9, 2024 · 2 comments
Open

Issues on final product [EMODNETBIO-181] #1

salvafern opened this issue Aug 9, 2024 · 2 comments
Assignees

Comments

@salvafern
Copy link

Hi @wstolte,

Here some issues Im finding in the file with the final product. Writing here for documentation.
Notice how the values of the variables present some aberrations.

e.g. I would not expect such high max and low min values of latitude, longitude or habitat suitability.

Longitude, latitude and aphiaid must be in ascendant order. Longitude must not have duplicated values.

The aphiaIDs in the file do not exist. The taxon_name and taxon_lsid seem corrupt.

Could you have a look? Maybe something went wrong when creating the file?

library(ncdf4)
library(dplyr)
is.sorted = Negate(is.unsorted)

nc <- nc_open("./product/foo.nc")
nc
#> File ./product/foo.nc (NC_FORMAT_CLASSIC):
#> 
#>      4 variables (excluding dimension variables):
#>         char taxon_name[string80,aphiaid]   
#>             standard_name: biological_taxon_name
#>             long_name: Scientific name of the taxa
#>         char taxon_lsid[string80,aphiaid]   
#>             standard_name: biological_taxon_lsid
#>             long_name: Life Science Identifier - World Register of Marine Species
#>         char crs[]   
#>             long_name: Coordinate Reference System
#>             geographic_crs_name: WGS 84
#>             grid_mapping_name: latitude_longitude
#>             reference_ellipsoid_name: WGS 84
#>             horizontal_datum_name: WGS 84
#>             prime_meridian_name: Greenwich
#>             longitude_of_prime_meridian: 0
#>             semi_major_axis: 6378137
#>             semi_minor_axis: 6356752.31424518
#>             inverse_flattening: 298.257223563
#>             spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
#>             GeoTransform: -180 0.08333333333333333 0 90 0 -0.08333333333333333 
#>         double habitat_suitability[lon,lat,aphiaid]   
#>             _FillValue: -99999
#>             long_name: Modelled habitat suitability for a species.
#> 
#>      4 dimensions:
#>         lon  Size:4361 
#>             units: degrees_east
#>             standard_name: longitude
#>             long_name: Longitude
#>         lat  Size:4209 
#>             units: degrees_north
#>             standard_name: latitude
#>             long_name: Latitude
#>         aphiaid  Size:4 
#>             long_name: Life Science Identifier - World Register of Marine Species
#>         string80  Size:80 (no dimvar)

# check lat/lon
lat <- ncvar_get(nc, "lat")
summary(lat)
#>        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
#> -1.448e+307   5.300e+01   5.400e+01 -3.477e+303   5.600e+01  2.509e+288

is.sorted(lat)
#> [1] FALSE

any(duplicated(lat))
#> [1] FALSE

lon <- ncvar_get(nc, "lon")
summary(lon)
#>        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
#>  -1.000e+05   0.000e+00   3.000e+00  1.308e+278   6.000e+00  5.705e+281

is.sorted(lon)
#> [1] FALSE

any(duplicated(lon))
#> [1] TRUE

# Check taxon
aphiaid <- ncvar_get(nc, "aphiaid")
is.sorted(aphiaid)
#> [1] FALSE

aphiaid
#> [1] -2073396244  1078560851   379889075  1078560798

ncvar_get(nc, "taxon_name")
#> [1] "\xa8\xde\xcf{@I\x87\xea;\030\xf9B@I\x87\xb5\xcdS#\n@I\x87\x81_\x8dL\xd1@I\x87L\xf1\xc7v\x98@I\x87\030\x84\001\xa0`@I\x86\xe4\026;\xca(@I\x86\xaf\xa8u\xf3\xef@I\x86{:\xb0\035\xb6@I\x86F\xcc\xeaG~@I\x86\022"
#> [2] "_$qE@I\x85\xdd\xf1^\x9b\r@I\x85\xa9\x83\x98\xc4\xd4@I\x85u\025\xd2\xee\x9c@I\x85@\xa8\r\030c@I\x85\f:GB+@I\x84\xd7́k\xf2@I\x84\xa3^\xbb\x95\xba@I\x84n\xf0\xf5\xbf\x81@I\x84:\x83/\xe9I@I\x84\006"            
#> [3] "\025j\023\020@I\x83ѧ\xa4<\xd8@I\x83\x9d9\xdef\x9f@I\x83h\xcc\030\x90f@I\x834^R\xba.@I\x82\xff\xf0\x8c\xe3\xf6@I\x82˂\xc7\r\xbd@I\x82\x97\025\0017\x84@I\x82b\xa7;aL@I\x82.9u\x8b\023@I\x81\xf9"              
#> [4] "˯\xb4\xdb@I\x81\xc5]\xe9ޢ@I\x81\x90\xf0$\bj@I\x81\\\x82^21@I\x81(\024\x98[\xf9@I\x80\xf3\xa6҅\xc0@I\x80\xbf9\f\xaf\x88@I\x80\x8a\xcbF\xd9O@I\x80V]\x81\003\027@I\x80!\xef\xbb,\xde@I\177\xed"

ncvar_get(nc, "taxon_lsid")
#> [1] "\x81\xf5V\xa6@I\177\xb9\024/\x80m@I\177\x84\xa6i\xaa5@I\177P8\xa3\xd3\xfc@I\177\033\xca\xdd\xfd\xc4@I~\xe7]\030'\x8b@I~\xb2\xefRQR@I~~\x81\x8c{\032@I~J\023Ƥ\xe1@I~\025\xa6"
#> [2] "8:\xf8p@I}\xac\xcau\"8@I}x\\\xafK\xff@I}C\xee\xe9u\xc7@I}\017\x81#\x9f\x8e@I|\xdb\023]\xc9V@I|\xa6\xa5\x97\xf3\035@I|r7\xd2\034\xe5@I|=\xca\fF\xac@I|\t\\Fpt@I{\xd4"        
#> [3] ";@I{\xa0\x80\xba\xc4\003@I{l\022\xf4\xed\xca@I{7\xa5/\027\x92@I{\0037iAY@Iz\xceɣk @Iz\x9a[ݔ\xe8@Ize\xee\027\xbe\xb0@Iz1\x80Q\xe8w@Iy\xfd\022\x8c\022>@Iy\xc8"              
#> [4] "\xa4\xc6<\006@Iy\x947"

# Check habitat suitability
habitat_suitability <- ncvar_get(nc, "habitat_suitability")
summary(habitat_suitability)
#>        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
#> -2.682e+154   0.000e+00   0.000e+00 -1.519e+152   0.000e+00  2.682e+154

Created on 2024-08-09 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Brussels
#>  date     2024-08-09
#>  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.2)
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.2)
#>  dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.2)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.2)
#>  fansi         1.0.6   2023-12-08 [1] CRAN (R 4.3.2)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.2)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.2)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.2)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.2)
#>  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.2)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.2)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.2)
#>  ncdf4       * 1.22    2023-11-28 [1] CRAN (R 4.3.2)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.2)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.2)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.2)
#>  R.oo          1.26.0  2024-01-24 [1] CRAN (R 4.3.2)
#>  R.utils       2.12.3  2023-11-18 [1] CRAN (R 4.3.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.2)
#>  reprex        2.1.0   2024-01-11 [1] CRAN (R 4.3.2)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.2)
#>  rmarkdown     2.26    2024-03-05 [1] CRAN (R 4.3.3)
#>  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.3.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
#>  styler        1.10.2  2023-08-29 [1] CRAN (R 4.3.2)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.2)
#>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.3.3)
#>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.3.2)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.2)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.2)
#>  xfun          0.43    2024-03-25 [1] CRAN (R 4.3.3)
#>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.2)
#> 
#>  [1] /home/localadmin/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@wstolte wstolte self-assigned this Aug 12, 2024
@salvafern
Copy link
Author

New iteration on 2024-08-27. Some of the issues seem solved, thanks @wstolte !

$ ncdump -h foo.nc 
netcdf foo {
dimensions:
	lon = 4361 ;
	lat = 4209 ;
	aphiaid = 4 ;
	string80 = 80 ;
variables:
	double lon(lon) ;
		lon:units = "degrees_east" ;
		lon:standard_name = "longitude" ;
		lon:long_name = "Longitude" ;
	double lat(lat) ;
		lat:units = "degrees_north" ;
		lat:standard_name = "latitude" ;
		lat:long_name = "Latitude" ;
	int aphiaid(aphiaid) ;
		aphiaid:long_name = "Life Science Identifier - World Register of Marine Species" ;
	char taxon_name(aphiaid, string80) ;
		taxon_name:standard_name = "biological_taxon_name" ;
		taxon_name:long_name = "Scientific name of the taxa" ;
	char taxon_lsid(aphiaid, string80) ;
		taxon_lsid:standard_name = "biological_taxon_lsid" ;
		taxon_lsid:long_name = "Life Science Identifier - World Register of Marine Species" ;
	char crs ;
		crs:long_name = "Coordinate Reference System" ;
		crs:geographic_crs_name = "WGS 84" ;
		crs:grid_mapping_name = "latitude_longitude" ;
		crs:reference_ellipsoid_name = "WGS 84" ;
		crs:horizontal_datum_name = "WGS 84" ;
		crs:prime_meridian_name = "Greenwich" ;
		crs:longitude_of_prime_meridian = 0. ;
		crs:semi_major_axis = 6378137. ;
		crs:semi_minor_axis = 6356752.31424518 ;
		crs:inverse_flattening = 298.257223563 ;
		crs:spatial_ref = "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]" ;
		crs:GeoTransform = "-180 0.08333333333333333 0 90 0 -0.08333333333333333 " ;
	double habitat_suitability(aphiaid, lat, lon) ;
		habitat_suitability:_FillValue = -99999. ;
		habitat_suitability:long_name = "Modelled habitat suitability for a species." ;
}

Indeed I can see that the data is not correctly plot (this case in panoply)

habitat_suitability_in_foo

I have faced this before and fixed by changing the order of the data, not the latitude/longitude. Lat and Lon must be in ascendent order. You read the data as a matrix and via transpose t() or changing order apply(x, 1, rev) you can rotate the data. I have tried quickly but don't have time now to continue, but the solution must be something similar.

library(terra)

array <- terra::rast("foo.nc") |> 
  terra::as.array()

matrix <- array[, , 1] |>
  terra::as.matrix()

dim(matrix)

# Transpose dimensions
matrix <- t(matrix)
# dim(matrix)

# Reverse order of lon
matrix <- apply(matrix, 1, rev)
# dim(matrix)

# Traspose one more time
matrix <- t(matrix)

raster <- terra::rast(matrix)
crs(raster) <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"

plot(raster)

image

@salvafern
Copy link
Author

salvafern commented Aug 30, 2024

One more version sent on 2024-08-27. Product looking good!

ncdump -h foo.nc 
netcdf foo {
dimensions:
	lon = 4361 ;
	lat = 4209 ;
	aphiaid = 4 ;
	string80 = 80 ;
variables:
	double lon(lon) ;
		lon:units = "degrees_east" ;
		lon:standard_name = "longitude" ;
		lon:long_name = "Longitude" ;
	double lat(lat) ;
		lat:units = "degrees_north" ;
		lat:standard_name = "latitude" ;
		lat:long_name = "Latitude" ;
	int aphiaid(aphiaid) ;
		aphiaid:long_name = "Life Science Identifier - World Register of Marine Species" ;
	char taxon_name(aphiaid, string80) ;
		taxon_name:standard_name = "biological_taxon_name" ;
		taxon_name:long_name = "Scientific name of the taxa" ;
	char taxon_lsid(aphiaid, string80) ;
		taxon_lsid:standard_name = "biological_taxon_lsid" ;
		taxon_lsid:long_name = "Life Science Identifier - World Register of Marine Species" ;
	char crs ;
		crs:long_name = "Coordinate Reference System" ;
		crs:geographic_crs_name = "WGS 84" ;
		crs:grid_mapping_name = "latitude_longitude" ;
		crs:reference_ellipsoid_name = "WGS 84" ;
		crs:horizontal_datum_name = "WGS 84" ;
		crs:prime_meridian_name = "Greenwich" ;
		crs:longitude_of_prime_meridian = 0. ;
		crs:semi_major_axis = 6378137. ;
		crs:semi_minor_axis = 6356752.31424518 ;
		crs:inverse_flattening = 298.257223563 ;
		crs:spatial_ref = "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]" ;
		crs:GeoTransform = "-180 0.08333333333333333 0 90 0 -0.08333333333333333 " ;
	double habitat_suitability(aphiaid, lat, lon) ;
		habitat_suitability:_FillValue = -99999. ;
		habitat_suitability:long_name = "Modelled habitat suitability for a species." ;
}

habitat_suitability_in_foo

There are only some details to be tackled:

  • taxon_lsid is the same as taxon_name, could you make taxon_lsid be "urn:lsid:marinespecies.org:taxname:" + aphiaid?
$ ncdump -v taxon_lsid foo.nc 

 taxon_lsid =
  "Sabellaria spinulosa",
  "Lanice conchilega",
  "Modiolus modiolus",
  "Ostrea edulis" ;

  • Could you rename the long_name attribute of "habitat_suitability" to "Probability of occurrence of biological entity specified elsewhere per unit area of the bed"
  • Could you rename the variable "habitat_suitability" to "probability_of_occurrence"?
  • Could you add the attribute "units" = "level" to the variable "aphiaid"?
  • Could you fill the global attributes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants