OTE is the authority who publishes the electricity and gas market prices in Czechia, e.g. the electricity market price for the date of 8-Nov-2015 in a human-readable format is available at https://www.ote-cr.cz/cs/kratkodobe-trhy/elektrina/denni-trh?date=2015-11-08 (there is also an option to download the data as a file in the .xls
format). For an unknown reason, OTE does not provide a public REST API interface serving this data. And thus this application.
The purpose of this application is two-fold:
- Scrape the market data from the OTE web site and store the data in a database.
- Provide a REST API interface which enables to retrieve the data stored in the database.
Each of these purposes is served by a dedicated Node.js module of this application described below. Additionally, this document explains the database data model and deployment of the application.
Further, this document outlines possible use of this application and possible enhancements.
Last but not least, there is also a disclaimer.
The data scraper module scrapes the market data from the OTE web site for a pre-configured period of time (see Configuration (DSM)) and stores the data in a database.
NPM package | Version | Comment |
---|---|---|
axios | ^1.7.9 | To read the remote OTE web page. |
cheerio | ^1.0.0 | To parse and scrape the remote OTE web page. |
dotenv | ^16.4.7 | To read the environment variables. |
mongoose | ^8.9.0 | To read data from and write data to a MongoDB database. |
util | ^0.12.5 |
It is possible to configure the module functionality by setting the following constants in the module source code:
CONF_START_DATE
- The first date to scrape the data for in the formatYYYY-MM-DD
.CONF_PERIOD_IN_DAYS
- The number of days to scrape the data for.
There are not any verifications of these configuration parameters implemented; this is a possible enhancement of this module.
The OTE API module is a very simple REST API server which receives an HTTP request at GET /marketData
and replies with a response containing JSON encoded data in the body.
Parameters of the request are documented in the section Configuration (OAM).
The response data are fetched from the database where they were previously stored by the data scraper module (DSM).
In addition to the dependencies listed in the Dependencies (DSM), the OTE API module has the following dependencies:
NPM package | Version | Comment |
---|---|---|
express | ^4.21.2. | To buid the OAM API. |
http | ^0.0.1-security | To provide the OAM API as an HTTP server. |
The HTTP request at GET /marketData
has the following two query parameters:
startDate
- Mandatory, the first date of the period to return the market data for.endDate
- Optional, the last date of the period to return the market data for. If omitted, the default value is equal to a value ofstartDate
and the response contains data for a single day.
A format of both parameters is YYYY-MM-DD
.
As it was mentioned above, the data scraper module uses a database to store the scraped data and the OTE API module uses the same database to provide the data through a REST API.
The database is MongoDB where the OTE market data for a single day is stored as a document of the oteOneDay
model:
const oteOneDaySchema = new mongoose.Schema({
date: {
type: mongoose.Schema.Types.Date,
unique: true,
},
marketData: [
Array
],
}, {
methods: {
},
});
export default mongoose.model('oteOneDay', oteOneDaySchema);
In the data model above, the marketData
is a two dimensional array of numbers, where the first dimension represents one-hour-slots during a day. Normally, there are 24 entries in this dimension. However, there are two exceptions during a calendar year:
- On the day when the CET time zone transitions to the CEST time zone (23 entries).
- On the day when the CEST time zone transitions to the CET time zone (25 entries).
The second dimension represents different types of numerical data for the given one-hour slot:
- The first value is a zero-based number of the one-hour slot.
- The second value is the electricity market price in
EUR/MWh
for the one-hour slot. - The third value is the amount of electricity traded in
MWh
for the one-hour slot. - The meaning of the fourth, five-th and sixth value is not perfectly clear to the author of this application.
-
Get MongoDB ready.
-
Connect to MongoDB using
mongosh
.mongosh mongodb://<username>:<password>@<MongoDB IP address>:<port>
-
Switch to the DB which you want to use for the application.
use <DB name>
-
Create a user with write access to the DB (take a note of the password so you can set the application environment variable later in the set up process).
db.createUser({ user: "<DB user>", pwd: passwordPrompt(), roles: [{role: "readWrite", db: "<DB name>"}] })
-
-
Install the application.
-
Using CLI, change to the directory of your choice.
-
Clone the application repo.
git clone https://github.com/janspudich/OTE-data-scraper-and-API.git
-
Install the dependencies
npm install
-
-
Configure the environment.
In the project root directory, create the
.env
file and set the following variables in the file:Variable name Variable value Comment OTE_BE_TUPLE_PORT
A port on which the OTE API module listens for connections. OTE_MONGO_SCHEMA
"mongodb" The MongoDB schema. OTE_MONGO_USER
The MongoDB user. OTE_MONGO_PASSW
The MongoDB user's password. OTE_MONGO_IP
An IP address of the MongoDB server. OTE_MONGO_PORT
A port of the MongoDB server. OTE_MONGO_DB_NAME
A name of the MongoDB database. OTE_HASH_KEY
A secret for hashing the API keys. -
Optional: configure the data scraper module, see Configuration (DSM).
-
Run the data scraper module:
node ./src/scraper.js
. -
Run the OTE API module:
node ./src/oteDataApi.js
.
This application provides a REST API interface which enables machines to consume the electricity market prices as published by OTE in Czechia.
Any application or system adding a value on top of the raw data can benefit from the application available in this repo. Examples of these systems include (but are not limited to) business intelligence (BI) or data visualization solutions.
- The data scraper module
- Use Memoizee or a similar caching solution.
- Verify the configuration parameters, incl. error handling.
- Expose the configuration parameters as command line options.
- A test suite.
- The OTE API module
- Develop the OAS file
- Verify the query parameters
startDate
andendDate
, incl. error handling. - A test suite.
The author is by no means affiliated with OTE and does not have any formal (or informal) relationship with the organization.
By using the data scraper module, you create a risk of generating a load which is higher then what the OTE infrastructure has been designed to handle. Use responsibly at your own risk.