IOC

The Sea level station monitoring facility website is focused on operational monitoring of sea level measuring stations across the globe on behalf of the Intergovernmental Oceanographic Commission (IOC) aggregating data from more than 170 providers.

A DataFrame with the IOC station metadata can be retrieved with get_ioc_stations() while the station data can be fetched with fetch_ioc_station():

searvey.get_ioc_stations(region=None, lon_min=None, lon_max=None, lat_min=None, lat_max=None)

Return IOC station metadata from: http://www.ioc-sealevelmonitoring.org/list.php?showall=all

If region is defined then the stations that are outside of the region are filtered out.. If the coordinates of the Bounding Box are defined then stations outside of the BBox are filtered out. If both region and the Bounding Box are defined, then an exception is raised.

Note: The longitudes of the IOC stations are in the [-180, 180] range.

Parameters:
  • region (MultiPolygon | Polygon | None) – Polygon or MultiPolygon denoting region of interest.

  • lon_min (float | None) – The minimum Longitude of the Bounding Box.

  • lon_max (float | None) – The maximum Longitude of the Bounding Box.

  • lat_min (float | None) – The minimum Latitude of the Bounding Box.

  • lat_max (float | None) – The maximum Latitude of the Bounding Box.

Returns:

GeoDataFramepandas.DataFrame with the station metadata.

searvey.fetch_ioc_station(station_id, start_date=None, end_date=None, *, rate_limit=None, http_client=None, multiprocessing_executor=None, multithreading_executor=None, progress_bar=False)

Make a query to the IOC API for tide gauge data for station_id and return the results as a pandas.Dataframe.

fetch_ioc_station("acap2")
fetch_ioc_station("acap2", start_date="2023-01-01", end_date="2023-01-02")

start_date and end_date can be of any type that is valid for pandas.to_datetime(). If start_date or end_date are timezone-aware timestamps they are coersed to UTC. The returned data are always in UTC.

Each query to the IOC API can request up to 30 days of data. When we request data for larger time spans, multiple requests are made. This is where rate_limit, multiprocessing_executor and multithreading_executor come into play.

In order to make the data retrieval more efficient, a multithreading pool is spawned and the requests are executed concurrently, while adhering to the rate_limit. The parsing of the JSON responses is a CPU heavy process so it is made within a multiprocessing Pool.

If no arguments are specified, then sensible defaults are being used, but if the pools need to be configured, an executor instance needs to be passed as an argument. For example:

executor = concurrent.futures.ProcessPoolExecutor(max_workers=4)
df = fetch_ioc_station("acap", multiprocessing_executor=executor)
Parameters:
  • station_id (str) – The station identifier. In IOC terminology, this is called ioc_code.

  • start_date (str | date | Timestamp | datetime | datetime64 | None) – The starting date of the query. Defaults to 7 days ago.

  • end_date (str | date | Timestamp | datetime | datetime64 | None) – The finishing date of the query. Defaults to “now”.

  • rate_limit (RateLimit | None) – The rate limit for making requests to the IOC servers. Defaults to 5 requests/second.

  • http_client (Client | None) – The httpx.Client. Can be used to setup e.g. an HTTP proxy.

  • multiprocessing_executor (ExecutorProtocol | None) – An instance of a class implementing the concurrent.futures.Executor API.

  • multithreading_executor (ExecutorProtocol | None) – An instance of a class implementing the concurrent.futures.Executor API.

  • progress_bar (bool) – If True then a progress bar is displayed for monitoring the progress of the outgoing requests.

Returns:

DataFramepandas.DataFrame with the station data.

Deprecated API

searvey.get_ioc_data(ioc_metadata, endtime='now', period=1, truncate_seconds=True, rate_limit=<searvey.rate_limit.RateLimit object>, disable_progress_bar=False)

Deprecated since version 0.4.0: Use fetch_ioc_station() instead.

Return the data of the stations specified in ioc_metadata as an xr.Dataset.

truncate_seconds needs some explaining. IOC has more than 1000 stations. When you retrieve data from all (or at least most of) these stations, you end up with thousands of timestamps that only contain a single datapoint. This means that the returned xr.Dataset will contain a huge number of NaN which means that you will need a huge amount of RAM.

In order to reduce the amount of the required RAM we reduce the number of timestamps by truncating the seconds. This is how this works:

2014-01-03 14:53:02 -> 2014-01-03 14:53:00 2014-01-03 14:53:32 -> 2014-01-03 14:53:00 2014-01-03 14:53:48 -> 2014-01-03 14:53:00 2014-01-03 14:54:09 -> 2014-01-03 14:54:00 2014-01-03 14:54:48 -> 2014-01-03 14:54:00

Nevertheless this approach has a downside. If a station returns multiple datapoints within the same minute, then we end up with duplicate timestamps. When this happens we only keep the first datapoint and drop the subsequent ones. So potentially you may not retrieve all of the available data.

If you don’t want this behavior, set truncate_seconds to False and you will retrieve the full data.

Parameters:
  • ioc_metadata (DataFrame) – A pd.DataFrame returned by get_ioc_stations

  • endtime (str | date | datetime | Timestamp) – The date of the “end” of the data. Defaults to datetime.date.today()

  • period (float) – The number of days to be requested. IOC does not support values greater than 30

  • truncate_seconds (bool) – If True then timestamps are truncated to minutes (seconds are dropped)

  • rate_limit (RateLimit) – The default rate limit is 5 requests/second.

  • disable_progress_bar (bool) – If True then the progress bar is not displayed.

Returns:

Dataset – An xr.Dataset with the station data.