diff --git a/GraphQL_intro.md b/GraphQL_intro.md new file mode 100644 index 00000000..52f91c60 --- /dev/null +++ b/GraphQL_intro.md @@ -0,0 +1,185 @@ +--- +layout: default +title: GraphQL & API +parent: Usage +nav_order: 2 +permalink: /usage/graphql +--- +# GraphQL & API +{: .no_toc } + +## Table of contents +{: .no_toc .text-delta } + +1. TOC +{:toc} + +--- +# Introduction to GraphQL and querying the API + +GraphQL is a query language for Application Programming Interfaces (APIs), which documents what data is available in the API and allows to query and get exactly the data we want and nothing more. + +This tutorial provides a short introduction to GraphQL, but we recommend you to explore the [GraphQL documentation](https://graphql.org/learn/) and other [introductory resources like this one](https://docs.github.com/en/graphql/guides/introduction-to-graphql) to learn more. + +In the GraphQL API Queries are written in the GraphQL language, and the result (the data) is given back in [JSON](https://www.w3schools.com/whatis/whatis_json.asp) format. JSON (from JavaScript Object Notation) is a standard text-based format for representing structured data. It is widely used for transmitting data in web applications, and it can easily be reformatted into tables or data frames within programming languages like R or Python. + +Zendro provides a GraphQL API web interface, called Graph**i**QL, which is a Web Browser tool for writing, validating, and testing GraphQL queries. + +You can live-try an example here, which is the API that we will be using in this and other tutorials: [https://zendro.conabio.gob.mx/api/graphql](https://zendro.conabio.gob.mx/api/graphql). + +Zendro's GraphQL API allows not only to query the data, but also to create, modify or delete records (`mutate`). This is available only with authentication (i.e. logging in with edit permissions), and it won't be covered in this tutorial, but you can check other Zendro's How to guides for details on mutations. + +## GraphiQL web interface + +The GraphiQL API web interface has the following main components: + +* A **left panel** where you can write your query in GraphQL format. +* A **right panel** where the result of the query is provided in JSON format. +* A **play button** to execute the query. Its keyboard shortcut is `Ctr+E`. +* A **Documentation Explorer** side menu, which you can show or hide clicking on "Docs" in the top right corner. + + +![API_parts.png](../figures/API_parts.png) + +Data in GraphQL is organised in **types** and **fields** within those types. When thinking about your structured data, you can think of **types as the names of tables**, and **fields as the columns of those tables**. The records will be the rows of data from those tables. You can learn more in the [GraphQL documentation](https://graphql.org/learn/). + +A GraphQL service is created by defining types and fields on those types, and providing functions for each field on each type. + +The documentation explorer allows to examine what operations (e.g. query, mutation) are allowed for each type. Then, clicking on `Query` will open another view with the details of what operations are available to query the data. In this view, all types available in a given dataset are listed in alphabetical order, with the operations than can be done within them listed below. + +In the example of the image above, we can see that the first type is `cities`. Types can contain elements or arguments, which are specified inside parentheses `()`. Some of these may be required arguments (marked with `!`), such as `pagination`. + +You can extend the bottom panel ("Query variables") to provide dynamic inputs to your query. [Learn more](https://graphql.org/learn/queries/#variables). + +## Writing queries + +The [GraphQL documentation](https://graphql.org/learn/) includes plenty of resources to learn how to build queries and make the most out of the power of GraphQL. Below we provide just a short summary, after which we recommend you to explore the [GraphQL documentation](https://graphql.org/learn/) to learn more. Feel free to try your queries in our [Zendro Dummy API](https://zendro.conabio.gob.mx/dummy_api) we set up for tests. + +**GraphQL syntax tips:** + +* Queries and other operations are written between curly braces `{}`. +* Types can contain elements or arguments, which are specified inside parentheses `()`. +* Use a colon `:` to set parameter arguments (e.g. `pagination:{limit:10, offset:0}`). +* Use a hashtag `#` to include comments within a query, which are useful to document what you are doing. +* A query should provide at least one type (e.g. `rivers`), at least one a field (e.g. `names`), and any mandatory arguments the types have (marked with `!` in the Docs). +* In Zendro `pagination` is a mandatory argument. It refers to the number of records (`limit`) the output returns, starting from a given `offset`. If you don't specify the offset, by default this will be `offset:0` + +A simple query will look like this: + +``` +{rivers(pagination:{limit:10, offset:0}){ + # fields you want from the "rivers" type go here + name + } +} +``` + +Copy-pasting and executing the former query in GrapiQL looks like the following image. That is, we got the names of the first 10 rivers of the data : + +![API_query1.png](../figures/API_query1.png) + +But how did we know that `name` is a field within `rivers`? There are two options: + +**Option 1: Check the Docs panel** + +Click on `Query`, then look for the type you want (in this example `rivers`), and then click on `[river]`. This will open the list of fields available for it, along with their documentation: + +![API_docs2.png](../figures/API_docs2.png) + + +**Option 2: autocomplete while you type** + +If you want to know what fields are available for the type `rivers` you can hold `ctrl+space` within the curly braces `{}` after `rivers(pagination:{limit:10, offset:0})`. A menu will appear showing you all possible fields. + +![API_query2.png](../figures/API_query2.png) + +In this example, we can get the fields `river_id`, `name` and `country_ids`. The rest of the list is related to `countries`, because `rivers` is associated with `countries`, and therefore we can build a more complex query with them. + +But first, lets build a query to give us back the fields `river_id`, `name` and `country_ids` from the type `river`, like this: + +``` +{rivers(pagination:{limit:10, offset:0}){ + river_id + name + length + country_ids + } +} +``` + +As a result of the query, for each of the 10 first rivers (10 because we set `limit:10`) of the data we will get its id, name, length, and the id of any country it is associated to: + +![API_query3.png](../figures/API_query3.png) + +### Extracting data from different types (i.e. dealing with associations) + +GraphQL can get fields associated with a record in different types, allowing us to get the data with only the variables and records we need form the entire dataset. For example, we can get the name and length of a river, but also the name and population of the countries it crosses. + +Extracting data from associated types depends on if the association is *one to one* (a city belongs to one country) or *one to many* (a river can cross many countries). + +#### One to one + +When the association is *one to one* the associated data model will apear as just another field, . For example each `city` is associated with one `country`, therefore `country` is one of the fields available within `cities`. + +If you look at the Docs, you will notice that it is not just another field, but that you need to provide it with an input search. + +![API_city](../figures/API_city.png) + +In this case we want to look for what country this is associated, and we know that the field in common (i.e. the key) is the `country_id`, therefore your search should look like: + +``` +{ +cities(pagination:{limit:10, offset:0}){ + city_id + name + population + country(search:{field:country_id}){ + name + population + } + } + } +``` + + +#### One to many + +When the association is *one to many* there would be a `Connection` for each each association the model has. For example, to see the countries a river is associated to we need to use `countriesConnection`: + +``` +{rivers(pagination:{limit:10, offset:0}){ + river_id + name + length + country_ids + countriesConnection(pagination:{first:1}){ + countries{ + name + population} + } + } +} +``` + +Remember to check the Docs for any mandatory argument. In this case `pagination` is mandatory. You can check what you are expected to write in its `paginationCursorInput` by clicking on it in the documentation. Also check the [pagination documentation](https://zendro-dev.github.io/api_root/graphql#pagination-argument) for details on how to use this argument. + +After you execute the query, you will get the same data we got for each river before, but also the data of the country (or countries, if it were the case) it is associated to. + +![API_query4.png](../figures/API_query4.png) + + +In the above examples all the arguments are inside the query string. But the arguments to fields can also be dynamic, for instance there might be a dropdownn menu in an application that lets the user select which City the user is interested in, or a set of filters. + +To improve run time, GraphQL can factor dynamic values out of the query, and pass them as a separate dictionary. These values are called **variables**. Common variables include search, order and pagination. To work with variables you need to do three things: + +1. Replace the static value in the query with `$variableName` +2. Declare `$variableName` as one of the variables accepted by the query +3. Pass `variableName: value` in the separate, transport-specific (usually JSON) variables dictionary + +Check the [official documentation](https://graphql.org/learn/queries/#variables) for examples. + +As you can see, you can write much more complex queries to get the data you want. Please explore the [GraphQL documentation](https://graphql.org/learn/) or many other resources out there to learn more. The above examples should get you going if you want to get data to perform analyses in R or Python. + +Before trying to download data from R, Python or any other programming language using the GraphQL API, we recommend writing the query to the GraphiQL web interface and making sure it returns the desired data as in the right panel in the image above. + +**Next step?** Check Zendro How to guides for tutorials on how to use the GraphQL API from R or Python to explore and analyse data stored in Zendro. diff --git a/README.md b/README.md index 35028905..bb839add 100644 --- a/README.md +++ b/README.md @@ -8,41 +8,74 @@ Zendro is a software tool to quickly create a data warehouse tailored to your sp Zendro consists of two main components, backend and frontend. The backend component has its [base project](https://github.com/ScienceDb/graphql-server) and a [code generator](https://github.com/ScienceDb/graphql-server-model-codegen). The frontend of SPA (Single Page Application) also has its [base project](https://github.com/ScienceDb/single-page-app). See the guides below on how to use Zendro. -Also find Zendro-dev on [github](https://github.com/Zendro-dev). +To see or contribute to our code please visit Zendro-dev on [github](https://github.com/Zendro-dev), where you can find the repositories for: -If you have any questions or comments, please don't hesitate to contact us via an issue [here](https://github.com/Zendro-dev/Zendro-dev.github.io/issues). Tag your issue as a question and we will try to answer as quick as possible. +* [GraphQL server](https://github.com/ScienceDb/graphql-server) +* [GraphQL server model generator](https://github.com/ScienceDb/graphql-server-model-codegen) +* [Single page application](https://github.com/ScienceDb/single-page-app) + +If you have any questions or comments, please don't hesitate to contact us via an issue [here](https://github.com/Zendro-dev/Zendro-dev.github.io/issues). Tag your issue as a question or bug and we will try to answer as quick as possible. + +## SHOW ME HOW IT LOOKS! + +Would you like to see Zendro in action before deciding to learn more? That's fine! We set up a Dummy Zendro Instance for you to explore [Zendro's graphical user interface]( https://zendro.conabio.gob.mx/spa) and [Zendro's API]( https://zendro.conabio.gob.mx/graphiql). The tutorials on how to [use Zendro day to day](#using-zendro-day-to-day) of the section below use this instance, so go there to start exploring. -[![Go to Quickstart](./figures/quick.png)](https://github.com/Zendro-dev/Zendro-dev.github.io/blob/master/quickstart.md) +### Installation and sysadmin -[![Go to Getting started guide](./figures/gettingstarted.png)](https://github.com/Zendro-dev/Zendro-dev.github.io/blob/master/setup_root.md) +To start trying Zendro you can try the [Quickstart tutorial](https://zendro-dev.github.io/quickstart.html) on how to create a new Zendro project with pre-defined datamodels, database and environment variables. Then you can try the [Getting started tutorial](https://zendro-dev.github.io/setup_root.html), a step-by-step guide on how to create a new Zendro project from scratch, aimed at software developers and system administrators. + +[](quickstart.md) + +[](setup_root.md) + +For more sysadmin details also check: ### HOW-TO GUIDES: -* [How to define data models: for developers](setup_data_scheme.md). Detailed technical specifications on how to define data models for Zendro, aimed at software developers and system administrators. -* [How to define data models: for non-developers](non-developer_documentation.md). A brief, illustrated guide of data model specifications, data formatting and data uploading options, aimed at data modelers or managers to facilitate collaboration with developers. -* [How to setup a distributed cloud of zendro nodes](ddm.md). A brief guide, aimed at software developers and system administrators, on how to use Zendros distributed-data-models. * [How to use Zendro command line interface (CLI)](zendro_cli.md). A tutorial of Zendro CLI, aimed at software developers. -* [How to query and extract data](fromGraphQlToR.html). A concise guide on how to use the Zendro API from R to extract data and perform queries, aimed at data managers or data scientists. * [How to setup Authentication / Authorization](oauth.md). A concise guide on how to use and setup the Zendro authorization / authentication services. -* [API documentation](api_root.md). +* [How to setup a distributed cloud of Zendro nodes](ddm.md). A brief guide, aimed at software developers and system administrators, on how to use Zendros distributed-data-models. +* [API documentation](api_root.md). A summary of how Zendro backend generator implements a CRUD API that can be accessed through GraphQL query language. -### REPOSITORIES: +### Defining data models -* [GraphQL server](https://github.com/ScienceDb/graphql-server) -* [GraphQL server model generator](https://github.com/ScienceDb/graphql-server-model-codegen) -* [Single page application](https://github.com/ScienceDb/single-page-app) +* [How to define data models: for developers](setup_data_scheme.md). Detailed technical specifications on how to define data models for Zendro, aimed at software developers and system administrators. +* [How to define data models: for non-developers](what_are_data_models.md). A brief, illustrated guide of data model specifications, data formatting and data uploading options, aimed at data modelers or managers to facilitate collaboration with developers. + +### Using Zendro day to day + +* [How to use Zendro's graphical interface](SPA_howto.md). A full guide on how to use Zendro's graphical point and click interface. Aimed to general users and featuring lots of screenshots. +* [Introduction to GraphQL and querying the API](GraphQL_intro.md). A friendly intro to how to perform GraphQL queries and use GraphiQL documentation. +* [How to query and extract data from R](fromGraphQlToR.html). A concise guide on how to use the Zendro API from R to extract data and perform queries, aimed at data managers or data +* [How to use the Zendro API with python to make changes to the data](Zendro_requests_with_python.md). A concise guide on how to access the API using your user credentials to make CRUD operations on the data using python. + +## Zendro users profiles -### CONTRIBUTIONS +We designed Zendro to be useful for research teams and institutions that include users with different areas of expertise, needs and type of activities. The table below summarizes how we envision that different users will use Zendro: + + +| Profile | Background | Expected use | +|-------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| General user / scientist | | | +| Data scientist / data analyst | | | +| Data manager | | | +| Sysadmin | | | + +# CONTRIBUTIONS Zendro is the product of a joint effort between the Forschungszentrum Jülich, Germany and the Comisión Nacional para el Conocimiento y Uso de la Biodiversidad, México, to generate a tool that allows efficiently building data warehouses capable of dealing with diverse data generated by different research groups in the context of the FAIR principles and multidisciplinary projects. The name Zendro comes from the words Zenzontle and Drossel, which are Mexican and German words denoting a mockingbird, a bird capable of “talking” different languages, similar to how Zendro can connect your data warehouse from any programming language or data analysis pipeline. -#### Zendro contributors in alphabetical order -Francisca Acevedo1, Vicente Arriaga1, Katja Dohm3, Constantin Eiteneuer2, Sven Fahrner2, Frank Fischer4, Asis Hallab2, Alicia Mastretta-Yanes1, Roland Pieruschka2, Alejandro Ponce1, Yaxal Ponce2, Francisco Ramírez1, Irene Ramos1, Bernardo Terroba1, Tim Rehberg3, Verónica Suaste1, Björn Usadel2, David Velasco2, Thomas Voecking3 +### Zendro contributors in alphabetical order +Francisca Acevedo1, Vicente Arriaga1, Vivian Bass1, Katja Dohm3, Jaime Donlucas1, Constantin Eiteneuer2, Sven Fahrner2, Frank Fischer4, Asis Hallab2, Alicia Mastretta-Yanes1, Roland Pieruschka2, Erick Palacios-Moreno1, Alejandro Ponce1, Yaxal Ponce2, Francisco Ramírez1, Irene Ramos1, Bernardo Terroba1, Tim Rehberg3, Ulrich Schurr2, Verónica Suaste1, Björn Usadel2, David Velasco2, Thomas Voecking3 and Dan Wang2 -##### Author affiliations +#### Author affiliations 1. CONABIO - Comisión Nacional para el Conocimiento y Uso de la Biodiversidad, México -2. Forschungszentrum Jülich - Germany -3. auticon - www.auticon.com +2. Forschungszentrum Jülich, Germany +3. Auticon - www.auticon.com 4. InterTech - www.intertech.de -#### Zendro author contributions -Asis Hallab and Alicia Mastretta-Yanes coordinated the project. Asis Hallab designed the software. Programming of code generators, the browser based single page application interface, and the GraphQL application programming interface was done by Katja Dohm, Constantin Eiteneuer, Francisco Ramírez, Tim Rehberg, Veronica Suaste, David Velasco, Thomas Voecking, and Dan Wang. Counselling and use case definitions were contributed by Francisca Acevedo, Vicente Arriaga, Frank Fischer, Roland Pieruschka, Alejandro Ponce, Irene Ramos, and Björn Usadel. User experience and application of Zendro on data management projects was carried out by Asis Hallab, Alicia Mastretta-Yanes, Yaxal Ponce, Irene Ramos, Verónica Suaste, and David Velasco. Logo design was made by Bernardo Terroba. +#### Author contributions +Conceptualization, management and coordination of the project was done by Asis Hallab and Alicia Mastretta-Yanes. Software design was done by Asis Hallab. Programming, implementation and testing of the computer code was done by Vivian Bass, Katja Dohm, Constantin Eiteneuer, Asis Hallab, Francisco Ramírez, Tim Rehberg, Veronica Suaste, David Velasco, Thomas Voecking and Dan Wang. Use case definitions were provided by Frank Fischer, Roland Pieruschka, Irene Ramos, and Björn Usadel. Acquisition of the financial support for the project was contributed by Francisca Acevedo, Vicente Arriaga and Björn Usadel. User experience and application of Zendro on data management projects was carried out by Vivian Bass, Jaime Donlucas, Asis Hallab, Alicia Mastretta-Yanes, Erick Palacios Moreno, Alejandro Ponce, Yaxal Ponce, Irene Ramos, Verónica Suaste and David Velasco. Writing the original draft of the manuscript and software documentation was done by Vivian Bass, Constantin Eiteneuer, Asis Hallab, Alicia Mastretta-Yanes, Irene Ramos, Verónica Suaste and Dan Wang. Logo desing was done by Bernardo Terroba. + +### Funding + +On the Mexican side, Zendro was developed with resources of the Global Environmental Fund though the Mexican Agrobiodiversity Project (GEF Project ID: 9380). On the German side Zendro was developed with resources of the projects [german grants here]. diff --git a/SPA_howto.md b/SPA_howto.md new file mode 100644 index 00000000..9c46d920 --- /dev/null +++ b/SPA_howto.md @@ -0,0 +1,227 @@ +--- +layout: default +title: Graphical UI +parent: Usage +nav_order: 3 +permalink: /usage/spa +--- +# Graphical UI +{: .no_toc } + +## Table of contents +{: .no_toc .text-delta } + +1. TOC +{:toc} + +--- + +# How to use Zendro's graphical user interface + + +Zendro's graphical point-and-click user interface is accessible in a web browser, as a single page application (SPA). + +To explore how it looks, you can look at the screenshots below, or even better try it out live! Just go to [https://zendro.conabio.gob.mx/spa](https://zendro.conabio.gob.mx/spa), where you will find a dummy Zendro instance we set up for you to try. + +Zendro's home graphical interface looks similar to the image below. But of course it can be customised (by the person who installed Zendro) to show what you prefer: + +![SPA_home.png](../figures/SPA_home.png) + +All data administration functions available in the graphical user interface are subjected to user based access control, meaning a user only sees the respective icons, buttons, and even model names in the main menu, if and only if she / he has the required access rights to the respective read or write operations. + +## Login + +Clicking in LOGIN will prompt you for your username and password. Zendro's graphical interface allows users to Create, Read, Update or Delete (CRUD) records, but you can decide which users can do what. For instance only one or two people on a research team may have edit access rights to create, update or delete records, while several other members of the team could be allowed to read. + +We created a user with reading access rights for you to explore. You can login with the user shown in the image below and the password *reader*. + +![SPA_login.png](../figures/SPA_login.png) + +## Exploring data + +Upon successful sign-in in the graphical interface, the user is presented with an overview menu on the left, offering one entry per data model. That is, Zendro will create a table for each of the data models provided when setting it up. + +In this dummy example the models are "City", "River" and "Country". The home is shown in blank by default, but you can use this area to add whatever you want, like models documentation or project introduction. + +![SPA_models1.png](../figures/SPA_models1.png) + +Clicking on a data model name will present the user with the **main data model table**, showing a column for each field in the data model (in this example "city_id", "name" and "population" of the model "city"), and a row for each record. + + +![SPA_models2.png](../figures/SPA_models2.png) + +If you are exploring a table with lots of rows, you can modify the number of rows the table shows at a time by clicking on the number in the lower-left corner. The number of pages will be modified automatically to fit all the data to the desired number of rows per page. Try that in the river model: + +![SPA_rivers.png](../figures/SPA_rivers.png) + +At the bottom right of the table, the user can skip forward or backward through pages: + +![SPA_paginationmenu.png](../figures/SPA_paginationmenu.png) + + +You can hide or expand the data models menu by clicking in the ">" icon: + +![SPA_hidemenu.png](../figures/SPA_hidemenu.png) + + +Clicking on the column name will sort the data by that column in ascending alphabetical order. Above the table the user can enter search terms which will be matched against any column, if the respective field is of type string (text), and against any numeric columns, if the entered search term can be translated to a number. + +For each record, the table offers the user the option to open the detailed view of a selected record ("eye" icon). Users with edit and delete permissions also see the option to edit ("pencil" icon) or delete the record ("trash" icon): + + +![SPA_crud.png](../figures/SPA_crud.png) + +You may have noticed that you can only see the "eye" icon to go to the detailed view, but not the other icons. This is because the user we made public for this tutorial only has reader permissions and hence cannot modify the existing data. + +Clicking on the "eye" icon leads to the detailed view, which enables full inspection of all details ("ATTRIBUTES" tab) that a single record has: + +![SPA_models3.png](../figures/SPA_models3.png) + +If the user has edit permissions, they would also see buttons for opening the edit and delete forms. See [Editing data](#editing-data) below for details. + +![SPA_editoptions.png](../figures/SPA_editoptions.png) + +In the detailed view, all users also see a tab called "ASSOCIATIONS" at the top right, which shows one entry per association that the respective record (data model) has. In this case the record of the city "Aachen" is associated with "COUNTRY" and "CAPITALTO". + +![SPA_models4.png](../figures/SPA_models4.png) + +Upon clicking on an association name the user can now inspect the associated records. If this is an association of many records, i.e. one-to-many or many-to-many association type, a table is shown just like the main table. This table has the same sorting, search, and pagination functions as the data model's main table shown before. + +![SPA_models5.png](../figures/SPA_models5.png) + + + +## Editing data + +### Modify or delete existing records + +For a user with all Create, Read, Update, and Delete permissions, the main table offers the user the option to open the detailed view ("eye" icon), edit ("pencil" icon), or delete ("trash" icon) a given record: + +![SPA_crud.png](../figures/SPA_crud.png) + +Clicking on the "pencil" icon either in the data model's main table or in the detail view will open the edit form, pre-filled with the selected record's data. Here the user can change the data, and a validation error will show up if the data is invalid. + +![SPA_editoptions.png](../figures/SPA_editoptions.png) + +For example, if we try to introduce text in the "population" field, which is defined as an integer, we would be asked to instead enter a valid integer. + +![SPA_validation.png](../figures/SPA_validation.png) + +All fields can be modified in this form, except the id which is the key linking to the record associations. + +To edit the associations, on the "ASSOCIATIONS" tab click the name of the data model where the record to be associated is, for example "country". + +After clicking it, the user is shown a modified version of the data model's main table, in which an additional column enables the association or dissociation of records with the currently edited one. Associated records are marked with a "link chain" icon, and not associated records with a "broken chain link" icon. In this table the user can mark several associations to be executed once the "save" button is clicked. + +![SPA_associatons_edit.png](../figures/SPA_associatons_edit.png) + +Once any of the associated/not associated icons is clicked, they revert to their respective counterpart, a connected chain link becomes a broken one and vice versa. To highlight that these are to be persisted changes the converted icons are highlighted in color, green connected chain link icons indicate that an association with the respective record shall be established, while red broken chain link icons indicate the opposite. + +![SPA_associations_color.png](../figures/SPA_associations_color.png) + +The user can paginate through the whole set of records of the respective association and mark as many change operations as wished for. They will be collected and carried out, once the save button is clicked. Additionally the user can apply a variety of filters to the table. This includes filtering for already associated records as well as filtering those records that have been marked to be added or removed as an association. + +Once the desired associations have been made, click on the "save" icon at the top right menu: + +![SPA_associatons_save.png](../figures/SPA_associatons_save.png) + +Alternatively, if the record you wish to associate to does not exist, then it is possible to manually create it by clicking on the "+" icon at the top right menu: + +![SPA_associations_create.png](../figures/SPA_associations_create.png) + +### Add new records + +If your user has edit permissions, on the top bar of the main data model table you will see the following buttons to Reload the data ("circle arrow" icon), Add new record ("+" icon), Import data ("bold top arrow" icon), Download data as csv ("bold down arrow" icon) and Download the model template ("light down arrow" icon). If your user only has reading access you will only see the Reload and Download icons. + +![SPA_topmenu.png](../figures/SPA_topmenu.png) + +Notice that these buttons work independently for each data model table. That is, if you are in the "city" model you would be able to create or download records of the city table, but if you want to add a country you have to click in the "country" model in the left side menu. + +#### Add a single record manually + +The "+" icon enables the user to **Add a single new data record**, to the current data table. Upon clicking it, the user is presented with a form where the values for each field of the new record can be entered. + +![SPA_addsingle.png](../figures/SPA_addsingle.png) + +If invalid data has been entered and the user attempts to save that invalid data, validation error messages are shown as in the example above when we edited and existing record. + +Once you finish typing the data click on the "save" icon. You don't need to add data to all fields, but you will be asked if you are sure you want to leave some fields blank. Click "yes" to proceed: + +![SPA_addblankok.png](../figures/SPA_addblankok.png) + +Your new record will be saved. You can click on the "table" icon on the top to see the country table again: + +![SPA_addsucess.png](../figures/SPA_addsucess.png) + + + +#### Add several records from a file + +Adding single records one by one is useful sometimes, but many users want to add data in bulk. Users often have data in tables that were created in MS Excel, recorded with a digital device or by any other mean. You can import this data into Zendro from an **Excel file** (.xlsx), a **comma separated value file** (.csv) or a **json** file. Here we will cover the first two. + +The .xlsx or .csv data file is expected to follow these requirements: + +1. Column names in the first row must correspond to model attributes (i.e. column names in the Zendro main table for that model). For associations, corresponding column name should be `add`, e.g. `addCountries` for assciationName `countries`. +2. Empty values should be represented as `NULL`. +3. All fields should be quoted by `"`. However, if field delimiter and array delimiter do not occur in fields with String type, namely characters could be split without ambiguity, then no quotes are necessary. For example, if the field delimiter is `,` and one String field is like `Zendro, excellent!`, then without the quotation mark, this field will be split as two fields. So in such case these String fields must be quoted. +4. Default configuration: LIMIT_RECORDS=10000, RECORD_DELIMITER="\n", FIELD_DELIMITER=",", ARRAY_DELIMITER=";". They can be changed in the config file for environment variables. +5. Date and time formats must follow the [RFC 3339](https://tools.ietf.org/html/rfc3339) standard. + +Additionally, if you are uploading your data from a csv file: + +1. String (text) records should be quoted by `"`. For example `"Ingredient A, Ingredient B"` instead of `Ingredient A, Ingredient B`. However, if there are no commas (`,`) within any single record, then the quotes are not necessary. +2. `"NULL"` should be quoted. + +In order to get the field names right and check what type each one is (e.g. integer, character, etc), you can **Download the model template** by clicking in the "light down arrow" at the top right panel of the main data model table. You will be prompted to download a csv file named after the table you are downloading, for example "river". You can open this file in your favourite spreadsheet processor (e.g. Excel). + +It will have the the column names you need in the first row, and the data type in the second. Notice that if your data model has associations, the foreign keys of associating records will be shown as columns too. For example because the model "river" is associated with one or more "countries" you have to provide the `addCountries` in the last column. If we fill this example with a list of Mexican rivers, this means that the `addCountries` field should include Mexico's country id, which is "MX" according to the "country" data model table. + +![SPA_csvtemplate.png](../figures/SPA_csvtemplate.png) + +Next, edit this csv file to add your data. You can do this in a spreadsheet processor (e.g. Excel). + +Make sure to: + +* Leave the first line (column names) as it is. +* Replace the second row with data, but remember: the second row tells you what type of data Zendro is expecting for each field, e.g. `Int` = integer numbers, and `String` = text. +* Follow the data requirements detailed above. + +Save as an .xlsx file. It should be ready to upload it to Zendro! + +Alternatively, if you are using a Text Editor (e.g. NotePad, Sublime Text) or a command line program (e.g. R, python) to generate a .csv, your file should look something like this (here we quoted all fields to prevent any ambiguity): + +``` +river_id,name,length,country_ids +"47","Tonalá","82","NULL" +"48","Tuxpan","150","NULL" +"49","Verde","342","NULL" +"50","Yaqui","410","NULL" +``` + +Save it as an .csv file. It should be ready to upload it to Zendro! + +**Note:** you can also save as csv file from Excel, but make sure you add the `"` if any of your strings (text) have `,` within them. + + +To finally upload your data to Zendro click on `Import data` ("bold top arrow" icon) of the top right menu, select your file and click **Upload**: + +![SPA_addExcel.png](../figures/SPA_addExcel.png) + +If there are no errors your data will be uploaded. Click the Reload button of the top right menu if you can't see it. + +If there are errors you will be told what's the problem. Check your data and try again. + +**Note:** the maximum allowed upload size and other variables you see our interact with in the graphical interface are customizable when setting up Zendro. You (or your sys admin) can learn more about this int he [Environment variables](https://zendro-dev.github.io/setup_root/env_var) section of the documentation. + + +## Download data + +To download data, click on the `Download data` button (bold down arrow). You will be prompted to select a directory and file name where to save the data. + +![SPA_download.png](../figures/SPA_download.png) + + +The data will be saved in .csv format, which you can open in Excel or import it to statistical software like R. Notice that this only downloads the data of any given table at a time. Complex queries to download specific data or to show columns from different models in a single table, can be done through Zendro's API. Check Zendro home documentation for tutorials on how to do this. + + + diff --git a/Zendro_requests_with_python.md b/Zendro_requests_with_python.md new file mode 100644 index 00000000..ad1d8dbc --- /dev/null +++ b/Zendro_requests_with_python.md @@ -0,0 +1,277 @@ +--- +layout: default +title: Python +parent: Zendro API +nav_order: 4 +permalink: /api_root/python +--- +# How to use the Zendro API with python to make changes to the data + +If you are a user with CRUD credentials you can use Zendro GraphQL API to directly make changes to the data. + +Using the Zendro API its mostly straightforward and similar to how we write queries in [GraphiQL](https://zendro.conabio.gob.mx/graphiql) (check the [doc]({% link GraphQL_intro.md %})), but with some extra steps. + +The first thing we need to do is to get a session token so we can authenticate our queries. To achieve this, we use the python library `requests`. The steps to get a session token and use it with our requests are the following: + +1. First we need to write the auth credentials in dictionary to make a successful login, we are going to use those credentials in the POST request. +2. Now we use the library requests to make the POST, specifying the url that gives us the session token we need. In this example this is: [https://zendro.conabio.gob.mx/auth/realms/zendro/protocol/openid-connect/token](https://zendro.conabio.gob.mx/auth/realms/zendro/protocol/openid-connect/token). We also need to pass the dictionary we created earlier in the data parameter. +3. If the request was successful we can obtain the session token from the response, then we create a Session object from the requests library, and update the 'headers' to add the 'Authorization' header with the session token as a Bearer token. + + +```python +import requests + +# I used this library to hide the warnings for not verifying the ssl certificates +# of the url +import warnings +warnings.filterwarnings('ignore') + + +# credential for login to zendro +auth = { + "username": "", + "password": "", + "client_id": "zendro_graphql-server", + "grant_type": "password" +} + +# make a post to a zendro-keycloak endpoint to retrieve session token +login = requests.post( + "https://zendro.conabio.gob.mx/auth/realms/zendro/protocol/openid-connect/token", + verify=False, + data=auth +) + +# if status code in the response is 200, then the request was successful and we have +# the session token we need in the login response +if login.status_code == 200: + + # we create a session object to use it for the requests to zendro api + session = requests.Session() + + # and store the token we receive in the 'Authorization' header as a Bearer token + session.headers.update({ + "Authorization": "Bearer " + login.json()["access_token"] + }) + + print("Successful login") + + +``` + + Successful login + + +With a successful login we can now make the requests to the Zendro API. Word of advice: this token will expire after 30 mins. + + +The first query we are going to make, is a **mutation**. In this mutation we are going to create a new country in the country table, and we are going to retrieve the country id if the mutation was successful. + +To write the query we can use the multiline syntax of python, then we insert this string in a dictionary as a value to the key "query", and we pass it to the requests in the json parameter. + +All the requests we are going to do are addressed to the Zendro API url, which in this example is [https://zendro.conabio.gob.mx/api/graphql](https://zendro.conabio.gob.mx/api/graphql), and it is declared in the first parameter of the post method. + +To see if the query ran correctly, we can use the method `json` of the response object. + + +```python + +# we define the query to create a country +country_query = """ + mutation { + addCountry( + country_id:"JP", + name: "Japan", + population: 100000000, + size: 377975 + ) { + country_id + } + } +""" + +# and using the session object we make a POST request to zendro api +# to create the country +country_response = session.post( + "https://zendro.conabio.gob.mx/api/graphql", verify=False, json={"query": country_query} +) + +country_response.json() +``` + + + + + {'data': {'addCountry': {'country_id': 'JP'}}} + + + +Now we are going to create some cities and relate these cities with the country we just created above. To create a city we need to use a specific type for it, which is `addCity`, but if we intend to use it more than once in a single request, then we need to distinguish them with a different name for each one. To specify a name for our mutations we write it like: `name: mutation_name()`, as in the example below. + +To relate the city to our previous country, we add the `addCountry` param, with the country's id as the value. + + +```python + +# we can group several queries in one request by adding it into the mutation braces +# but if two requests are of the same type, then we have to specify a name for each one +# using the syntax: 'name: mutation_name()' like in the example below +city_query = """ + mutation { + osaka: addCity( + city_id: 6, + name: "Osaka", + population: 2691000, + addCountry: "JP" + ) { + city_id + } + suwon: addCity( + city_id: 7, + name: "Suwon", + population: 1241000, + addCountry: "JP" + ) { + city_id + } + } +""" + +# and then we make the request as usual +city_response = session.post( + "https://zendro.conabio.gob.mx/api/graphql", verify=False, json={"query": city_query} +) + +city_response.json() +``` + + + + + {'data': {'osaka': {'city_id': '6'}, 'suwon': {'city_id': '7'}}} + + + +We can also make a query request, by changing the word `mutation` to `query`. This way we, and also Zendro, knows that the query between the braces is going to be a read query and not a mutation. In the example below we are checking if the previous mutation correctly associated the cities with the country. + + +```python + +# to make a read request and not a mutation we change mutation to query +country_cities = """ + query { + readOneCountry(country_id: "JP") { + citiesFilter(pagination: { limit:10 }) { + name + } + } + } +""" + +country_cities_response = session.post( + "https://zendro.conabio.gob.mx/api/graphql", verify=False, json={"query": country_cities} +) + +country_cities_response.json() +``` + + + + + {'data': {'readOneCountry': {'citiesFilter': [{'name': 'Osaka'}, + {'name': 'Suwon'}]}}} + + + +We can **update the entries in our table with mutations, for this we need to specify the id of the entry we are going to update. It is also helpful to retrieve in the response the field we just update so we know the mutation ran correctly. + +In the next example we update the field `population` to correct the population of Japan. + + +```python + +# we can make changes to a table's entry using the specific type queries for that +# in this case we use 'updateCountry' to fix the population of Japan +update_country = """ + mutation { + updateCountry( + country_id: "JP", + population: 125000000 + ) { + name + population + } + } +""" + +update_response = session.post( + "https://zendro.conabio.gob.mx/api/graphql", verify=False, json={"query": update_country} +) + +update_response.json() +``` + + + + + {'data': {'updateCountry': {'name': 'Japan', 'population': 125000000}}} + + + +Finally to **delete** an entry, first we need to disassociate it from its relations, once our entry is no longer related to any other table, we can delete it. + +In the first examples, when we created our cities, we create the city of `Suwon`, which doesn't belong to Japan, so we first disassociate it to the country, and then we can delete it, by typing its id in the `deleteCity` mutation. + +The delete mutations normally just return a string, instead of the fields of the table we are targeting. + + +```python + +# to delete an entry, first we need to disassociate the entry of its relations +delete_city = """ + mutation { + updateCity( + city_id: 7, + removeCountry: "JP" + ) { city_id country_id } + deleteCity( + city_id: 7 + ) + } +""" +# the delete query returns a string +delete_response = session.post( + "https://zendro.conabio.gob.mx/api/graphql", verify=False, json={"query": delete_city} +) + +delete_response.json() + + + {'data': {'updateCity': {'city_id': '7', 'country_id': None}, + 'deleteCity': 'Item successfully deleted'}} + +``` + +Remember you can also pass [variables](https://graphql.org/learn/queries/#variables) to a given query. This allows users to define their query and then pass in the dynamic parameters (search, order, pagination,...) at run time. + +For instance, the file [cursor-based-pagination.js](https://github.com/Zendro-dev/graphql-server-model-codegen/blob/master/test/unit_test_misc/test-describe/cursor-based-pagination.js), of Zendro's code generator uses a query (line 142) using variables: +``` + let query = \`query booksConnection($search: searchBookInput $pagination: paginationCursorInput! $order: [orderBookInput]){ + booksConnection(search:$search pagination:$pagination order:$order){ edges{cursor node{ id title + genre + publisher_id + } } pageInfo{ startCursor endCursor hasPreviousPage hasNextPage } } }\` +``` +Here the variables are `search`, `pagination` and `order`. User can pass these variables dynamically. +And we can send query and variables via axios: + +``` + let response = await axios.post( + remoteZendroURL, + { + query: query, + variables: {search: search, order:order, pagination: pagination}, + }, + opts + ); +``` \ No newline at end of file diff --git a/api_root.md b/api_root.md index 93c6e906..e065bcf0 100644 --- a/api_root.md +++ b/api_root.md @@ -58,7 +58,7 @@ We offer two ways to download records, namely by Zendro command line interface o The concrete instruction is elaborated here: -[ > Data Export]({% link non-developer_documentation.md %}#data-download) +[ > Data Export]({% link what_are_data_models.md %}#data-download) ### SQL Statements in the Data model diff --git a/figures/API_city.png b/figures/API_city.png new file mode 100644 index 00000000..9faa7a4a Binary files /dev/null and b/figures/API_city.png differ diff --git a/figures/API_docs2.png b/figures/API_docs2.png new file mode 100644 index 00000000..0de72251 Binary files /dev/null and b/figures/API_docs2.png differ diff --git a/figures/API_parts.png b/figures/API_parts.png new file mode 100644 index 00000000..0088d128 Binary files /dev/null and b/figures/API_parts.png differ diff --git a/figures/API_query1.png b/figures/API_query1.png new file mode 100644 index 00000000..8c98fb21 Binary files /dev/null and b/figures/API_query1.png differ diff --git a/figures/API_query2.png b/figures/API_query2.png new file mode 100644 index 00000000..17f2c006 Binary files /dev/null and b/figures/API_query2.png differ diff --git a/figures/API_query3.png b/figures/API_query3.png new file mode 100644 index 00000000..6e1926f7 Binary files /dev/null and b/figures/API_query3.png differ diff --git a/figures/API_query4.png b/figures/API_query4.png new file mode 100644 index 00000000..28f02a02 Binary files /dev/null and b/figures/API_query4.png differ diff --git a/figures/SPA_addExcel.png b/figures/SPA_addExcel.png new file mode 100644 index 00000000..ba670e97 Binary files /dev/null and b/figures/SPA_addExcel.png differ diff --git a/figures/SPA_addblankok.png b/figures/SPA_addblankok.png new file mode 100644 index 00000000..fdb8b041 Binary files /dev/null and b/figures/SPA_addblankok.png differ diff --git a/figures/SPA_addsingle.png b/figures/SPA_addsingle.png new file mode 100644 index 00000000..249aa2cc Binary files /dev/null and b/figures/SPA_addsingle.png differ diff --git a/figures/SPA_addsucess.png b/figures/SPA_addsucess.png new file mode 100644 index 00000000..6b6d1fbd Binary files /dev/null and b/figures/SPA_addsucess.png differ diff --git a/figures/SPA_associations_color.png b/figures/SPA_associations_color.png new file mode 100644 index 00000000..a503a432 Binary files /dev/null and b/figures/SPA_associations_color.png differ diff --git a/figures/SPA_associations_create.png b/figures/SPA_associations_create.png new file mode 100644 index 00000000..7d779f4b Binary files /dev/null and b/figures/SPA_associations_create.png differ diff --git a/figures/SPA_associatons_edit.png b/figures/SPA_associatons_edit.png new file mode 100644 index 00000000..f74645f1 Binary files /dev/null and b/figures/SPA_associatons_edit.png differ diff --git a/figures/SPA_associatons_save.png b/figures/SPA_associatons_save.png new file mode 100644 index 00000000..9dac52da Binary files /dev/null and b/figures/SPA_associatons_save.png differ diff --git a/figures/SPA_crud.png b/figures/SPA_crud.png new file mode 100644 index 00000000..37343117 Binary files /dev/null and b/figures/SPA_crud.png differ diff --git a/figures/SPA_csvtemplate.png b/figures/SPA_csvtemplate.png new file mode 100644 index 00000000..ec91465c Binary files /dev/null and b/figures/SPA_csvtemplate.png differ diff --git a/figures/SPA_download.png b/figures/SPA_download.png new file mode 100644 index 00000000..68786f89 Binary files /dev/null and b/figures/SPA_download.png differ diff --git a/figures/SPA_editoptions.png b/figures/SPA_editoptions.png new file mode 100644 index 00000000..a014f89e Binary files /dev/null and b/figures/SPA_editoptions.png differ diff --git a/figures/SPA_hidemenu.png b/figures/SPA_hidemenu.png new file mode 100644 index 00000000..f7522829 Binary files /dev/null and b/figures/SPA_hidemenu.png differ diff --git a/figures/SPA_home.png b/figures/SPA_home.png new file mode 100644 index 00000000..00377927 Binary files /dev/null and b/figures/SPA_home.png differ diff --git a/figures/SPA_login.png b/figures/SPA_login.png new file mode 100644 index 00000000..64451cc1 Binary files /dev/null and b/figures/SPA_login.png differ diff --git a/figures/SPA_models1.png b/figures/SPA_models1.png new file mode 100644 index 00000000..01d927da Binary files /dev/null and b/figures/SPA_models1.png differ diff --git a/figures/SPA_models2.png b/figures/SPA_models2.png new file mode 100644 index 00000000..31f55592 Binary files /dev/null and b/figures/SPA_models2.png differ diff --git a/figures/SPA_models3.png b/figures/SPA_models3.png new file mode 100644 index 00000000..bcec740f Binary files /dev/null and b/figures/SPA_models3.png differ diff --git a/figures/SPA_models4.png b/figures/SPA_models4.png new file mode 100644 index 00000000..eebd26ad Binary files /dev/null and b/figures/SPA_models4.png differ diff --git a/figures/SPA_models5.png b/figures/SPA_models5.png new file mode 100644 index 00000000..58db5d4c Binary files /dev/null and b/figures/SPA_models5.png differ diff --git a/figures/SPA_paginationmenu.png b/figures/SPA_paginationmenu.png new file mode 100644 index 00000000..ab7a5f26 Binary files /dev/null and b/figures/SPA_paginationmenu.png differ diff --git a/figures/SPA_rivers.png b/figures/SPA_rivers.png new file mode 100644 index 00000000..e14469d6 Binary files /dev/null and b/figures/SPA_rivers.png differ diff --git a/figures/SPA_topmenu.png b/figures/SPA_topmenu.png new file mode 100644 index 00000000..524e37f0 Binary files /dev/null and b/figures/SPA_topmenu.png differ diff --git a/figures/SPA_validation.png b/figures/SPA_validation.png new file mode 100644 index 00000000..b33bcbbd Binary files /dev/null and b/figures/SPA_validation.png differ diff --git a/fromGraphQlToR.Rmd b/fromGraphQlToR.Rmd index 4cb1554f..1fca0c05 100644 --- a/fromGraphQlToR.Rmd +++ b/fromGraphQlToR.Rmd @@ -1,63 +1,316 @@ --- -title: "Zendro from R" +title: "How to get data from Zendro to R" output: html_document: df_print: paged - pdf_document: default --- -This is a breve documentation about how to use Zendro API from R. -We will explain with examples how to fetch and use data stored in a Zendro server. -The process we will follow goes from fetching data to getting that data in a table form. +Libraries needed for this tutorial: +```{r, warning=FALSE} +library(httr) +library(jsonlite) +library(dplyr) +library(stringr) +``` -## REQUIREMENTS -In order to be able to comunicate with the server we need an R package that allow us to work with HTTP verbs (GET(), POST(), etc). -For this purpose we will use `httr` package. -We will also need other libraries for manipulating json data and tables. -NOTE: Be aware that you might need to install the correspondant packages. +This tutorial uses the library `httr` to establish the connection with the GraphiQL API, but there are also other options to interact with GraphQL from R. Please check the R packages: [ghql](https://docs.ropensci.org/ghql/), [gqlr](https://github.com/schloerke/gqlr) and [graphql](https://github.com/ropensci/graphql). -```{r} -library(httr) # for http request -library(jsonlite) # for manipulating json data -library(data.table) # for table manipulation -library(rlist) # for table manipulation -library(dplyr) # for table manipulation +## Introduction to GraphQL API and how to query it + +GraphQL is a query language for Application Programming Interfaces (APIs). Queries are written in the GraphQL language, and the result (the data) is given back in [JSON format](https://www.w3schools.com/whatis/whatis_json.asp). + +If you are not familiar with GraphQL, we recommend you to start by checking the [Introduction to GraphQL and querying the API](GraphQL_intro.md) of Zendro How to Guides. + +Zendro provides a GraphQL API web interface, called Graph**i**QL, which is a Web Browser tool for writing, validating, and testing GraphQL queries. + +For example, try copy-pasting and executing the following query at [https://zendro.conabio.gob.mx/api/graphql](https://zendro.conabio.gob.mx/api/graphql), which is the API that we will be using in this and other tutorials. + +```{GraphQL eval=FALSE} +{ +rivers(pagination:{limit:10, offset:0}){ + river_id + name + length +} +} ``` +(The example above only gets the first 10 results, in a section of this tutorial we will explain how to define `pagination` to pull down a given number, or all, of the items in a dataset.) + -## QUERIES EXAMPLES +## Download a small dataset (<1,000 elements): -It's possible to perform any graphql query or mutation by using a POST request. -Let's start with the most simple example where we will be fetching all `accession_id` and `collectors_name` attributes from `accessions` from a given zendro server served in the url `http://localhost:3001/graphql`. -Due to efficiency reasons it is necessary to provide a pagination limit in the query. +The function `get_from_graphQL()` defined below queries a GraphQL API and transforms the data from JSON format (which is the output of GraphQL) into a R data frame object you can easily use for further analyses. If you want to now what's going on inside this function, there is an step-by-step detailed description at the end of this document. +To start using `get_from_graphQL()` first run the code below to load the function into your R environment (you can also have it as a different file and use `source()` to run it): ```{r} -url <- "http://localhost:3001/graphql" # set the zendro server url -accessions_query <- ' -{ - accessions(pagination:{limit: 6}){ - accession_id - collectors_name + +get_from_graphQL<-function(query, url){ +### This function queries a GraphiQL API and outpus the data into a single data.frame + +## Arguments +# query: a graphQL query. It should work if you try it in graphiQL server. Must be a character string. +# url = url of the server to query. Must be a character string. + +## Needed libraries: +# library(httr) +# library(jsonlite) +# library(dplyr) +# library(stringr) + +### Function + +## query the server +result <- POST(url, body = list(query=query), encode=c("json")) + +## check server response +satus_code<-result$status_code + +if(satus_code!=200){ + print(paste0("Oh, oh: status code ", satus_code, ". Check your query and that the server is working")) +} + +else{ + + # get data from query result + jsonResult <- content(result, as = "text") + + # check if data downloaded without errors + # graphiQL will send an error if there is a problem with the query and the data was not dowloaded properly, even if the connection status was 200. + ### FIX this when != TRUE because result is na + errors<-grepl("errors*{10}", jsonResult) + if(errors==TRUE){ + print("Sorry :(, your data downloaded with errors, check your query and API server for details") + } + else{ + # transform to json + readableResult <- fromJSON(jsonResult, + flatten = T) # this TRUE is to combine the different lists into a single data frame (because data comming from different models is nested in lists) + + # get data + data<-as.data.frame(readableResult$data[1]) + + # rename colnames to original variable names + x<-str_match(colnames(data), "\\w*$")[,1] # matches word characters (ie not the ".") at the end of the string + colnames(data)<-x # assing new colnames + return(data) + } } } -' # write the query as a string -result <- POST(url, body = list(query=accessions_query)) # fetch data + +``` + +`get_from_graphQL()` allows you to get data of up to 1,000 elements (results of your query) at a time, which is the maximum number allowed by GraphQL for a single batch. In the next section we explain how to use `pagination` to download larger datasets in batches. + +To use the `get_from_graphQL()` function, first you have to define a GraphQL query. If you don't know how to do this, start by checking the [Introduction to GraphQL and querying the API](GraphQL_intro.md) of Zendro How to Guides. + +Once you have a GraphQL query working, you'll need to save it to an R object as a character vector: + +```{r} +my_query<- "{ +rivers(pagination:{limit:10, offset:0}){ + river_id + name + length + } +} +" +``` + +Next we use this query as an argument for `get_from_graphQL()`, along with the url of the API, which is the same of the GraphiQL web interface you explored above: + +```{r} +data<-get_from_graphQL(query=my_query, url="https://zendro.conabio.gob.mx/api/graphql") +``` + +If all wen't well you will get a data frame with the result of your query: + +```{r} +head(data) +``` + + +## Download a dataset with more than >1,000 elements: + +GraphQL outputs the resutls of a query in batches of max 1,000 elements. So if the data you want to download is larger than that, then you need to **paginate**, i.e. to get the data in batches. `pagination` is is an argument within GraphQL queries that could be done by: + +* *Limit-offset*: indicating the first element to get (`offset`, default 0) and the number of elements to get (`limit`). The `limit` can't be larger than `1000`. + +* *Cursor-based*: indicating the unique ID (`cursor`) of the element to get first, and a number of elements to get after. + +Zendro uses the limit-offset pagination with the syntaxis: + +`pagination:{limit:[integer], offset:[integer]}` + +[See GraphQL documentation](https://graphql.org/learn/pagination/) and this [tutorial on GraphQL pagination](https://daily.dev/blog/pagination-in-graphql) for more details. + +In the previous examples we downloaded only 10 elements (`pagination:{limit:10})`) from the rivers type, but the dataset is larger. (Remember, data in GraphQL is organised in **types** and **fields** within those types. When thinking about your structured data, you can think of types as the names of tables, and fields as the columns of those tables. In the example above `rivers` is a type and the fields are `river_id`, `name`, `length` among others.) + +To know how many elements does a type has we can make a query with the function `count`, if it is available for the type we are interested on. You can check this in the `Docs` at the top right menu of the GraphiQL interface. + +For example, `rivers` has the function `countRivers` so with the query `{countRivers}` we can get the total number of rivers. + +Similar to how we got data before, you can use this very simple query in the function `get_from_graphQL` to get the number of rivers into R: + +```{r} +# query API with count function +no_records<-get_from_graphQL(query="{countRivers}", url="https://zendro.conabio.gob.mx/api/graphql") + +# change to vector, we don't need a df +no_records<-no_records[1,1] +no_records +``` + +In this case we have `r no_records`. Technically we could download all the data in a single batch because it is <1000, but for demostration purposes we will download it in batches of 10. + +The following code calculates the number of pages needed to get a given number of records assuming a desired limit (size of each batch). Then it runs `get_from_graphQL()` within a loop for each page until getting the total number of records desired. + + +```{r} +# Define desired number of records and limit. Number of pages and offset will be estimated based on the number of records to download +no_records<- no_records # this was estimated above with a query to count the total number of records, but you can also manually change it to a custom desired number +my_limit<-10 # max 1000. +no_pages<-ceiling(no_records/my_limit) + +## Define offseet. +# You can use the following loop: +# to calculate the offset automatically based on +# on the number of pages needed. +my_offset<-0 # start in 0. Leave like this +for(i in 1:no_pages){ # loop to + my_offset<-c(my_offset, my_limit*i) +} + +# Or you can define the offset manually +# uncommenting the following line +# and commenting the loop above: +# my_offset<-c(#manually define your vector) + +## create object where to store downloaded data. Leave empty +data<-character() + +## +## Loop to download the data from GraphQL using pagination +## + +for(i in c(1:length(my_offset))){ + +# Define pagination +pagination <- paste0("limit:", my_limit, ", offset:", my_offset[i]) + +# Define query looping through desired pagination: +my_query<- paste0("{ + rivers(pagination:{", pagination, "}){ + river_id + name + length + } + } + ") + + + +# Get data and add it to the already created df +data<-rbind(data, get_from_graphQL(query=my_query, url="https://zendro.conabio.gob.mx/api/graphql")) + +#end of loop +} + +``` + +As a result you will get all the data in a single df: + +```{r} +head(data) +summary(data) +``` + +## `get_from_graphQL()` explained step by step + +The following is a step-by-step example explaining with more detail how does the function `get_from_graphQL()` that we used above works. + +First, once you have a GraphQL query working, you'll need to save it to an R object as a character vector: + +```{r} +my_query<- "{ +rivers(pagination:{limit:10, offset:0}){ + river_id + name + length + } +} +" ``` -The `result` that we are getting is the `http` response. We still need to extract the data in order to be able to manipulate it. -If everything went well, the `http` response will contain an attribute `data` which will itself contain an attribute named as the query, in this case `accessions`. +Next, define as another character vector the url of the API, which is the same of the GraphiQL web interface you explored above: ```{r} +url<-"https://zendro.conabio.gob.mx/api/graphql" +``` + + +Now we can a query to the API by using a POST request: + +```{r} +# query server +result <- POST(url, body = list(query=my_query), encode = c("json")) +``` + +The result that we are getting is the `http` response. Before checking if we got the data, it is good practice to verify if the connection was successful by checking the status code. A `200` means that all went well. Any other code means problems. See [this](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). + +```{r} +# check server response +result$status_code +``` + +We now need to extract the data in order to be able to manipulate it. If everything went well, the `http` response will contain an attribute data which will itself contain an attribute named as the query, in this case `rivers`. + + +```{r} +result +``` + +If the query is not written properly or if there is any other error, the attribute `data` won't exist and instead we will get the attribute `erros` listing the errors found. + +If all wen't well we can proceed to extract the content of the results with: + +```{r} +# get data from query result jsonResult <- content(result, as = "text") -readableText <- fromJSON(jsonResult) -readableText$data$accessions + ``` -With the above code we were able to visualize the data fetched from zendro server. -Next we will put this data in a table form in order to be able to manipulate the data. +The result will be in json format, which we can convert into an Robjet (list). In this list the results are within each type used in the query. The argumment `flatten` is used to collapse the list into a single data frame the data from different types. ```{r} -dataTable <- lapply(readableText[[1]], as.data.table) -dataTable$accessions -``` \ No newline at end of file +# transform to json +readableResult <- fromJSON(jsonResult, + flatten = T) +``` + +Extract data: + +```{r} +# get data +data<-as.data.frame(readableResult$data[1]) +head(data) +``` + +By default, the name of each type will be added a the beggining of each column name: + +```{r} +colnames(data) +``` + +To keep only the name of the variable as it is in the original data: +```{r} +x<-str_match(colnames(data), "\\w*$")[,1] # matches word characters (ie not the ".") at the end of the string +colnames(data)<-x # assing new colnames +``` + +So finally we have the data in a single nice looking data frame: +```{r} +head(data) +``` +Notice that you will get a dataframe like teh one above only for one to one associations, but than in other cases you still will get variables that are a list, which you can process in a separate step. diff --git a/fromGraphQlToR.html b/fromGraphQlToR.html index dffd46b1..72d04f82 100644 --- a/fromGraphQlToR.html +++ b/fromGraphQlToR.html @@ -1,27 +1,36 @@ - + - -Zendro from R +How to get data from Zendro to R - + - + - + code{white-space: pre-wrap;} + span.smallcaps{font-variant: small-caps;} + span.underline{text-decoration: underline;} + div.column{display: inline-block; vertical-align: top; width: 50%;} + div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} + ul.task-list{list-style: none;} + + + + + +
+

Download a dataset with more than >1,000 elements:

+

GraphQL outputs the resutls of a query in batches of max 1,000 +elements. So if the data you want to download is larger than that, then +you need to paginate, i.e. to get the data in batches. +pagination is is an argument within GraphQL queries that +could be done by:

+
    +
  • Limit-offset: indicating the first element to get +(offset, default 0) and the number of elements to get +(limit). The limit can’t be larger than +1000.

  • +
  • Cursor-based: indicating the unique ID +(cursor) of the element to get first, and a number of +elements to get after.

  • +
+

Zendro uses the limit-offset pagination with the syntaxis:

+

pagination:{limit:[integer], offset:[integer]}

+

See GraphQL +documentation and this tutorial on GraphQL +pagination for more details.

+

In the previous examples we downloaded only 10 elements +(pagination:{limit:10})) from the rivers type, but the +dataset is larger. (Remember, data in GraphQL is organised in +types and fields within those types. +When thinking about your structured data, you can think of types as the +names of tables, and fields as the columns of those tables. In the +example above rivers is a type and the fields are +river_id, name, length among +others.)

+

To know how many elements does a type has we can make a query with +the function count, if it is available for the type we are +interested on. You can check this in the Docs at the top +right menu of the GraphiQL interface.

+

For example, rivers has the function +countRivers so with the query {countRivers} we +can get the total number of rivers.

+

Similar to how we got data before, you can use this very simple query +in the function get_from_graphQL to get the number of +rivers into R:

+
# query API with count function
+no_records<-get_from_graphQL(query="{countRivers}", url="https://zendro.conabio.gob.mx/api/graphql")
+
+# change to vector, we don't need a df
+no_records<-no_records[1,1]
+no_records
+
## [1] 50
+

In this case we have 50. Technically we could download all the data +in a single batch because it is <1000, but for demostration purposes +we will download it in batches of 10.

+

The following code calculates the number of pages needed to get a +given number of records assuming a desired limit (size of each batch). +Then it runs get_from_graphQL() within a loop for each page +until getting the total number of records desired.

+
# Define desired number of records and limit. Number of pages and offset will be estimated based on the number of records to download
+no_records<- no_records # this was estimated above with a query to count the total number of records, but you can also manually change it to a custom desired number
+my_limit<-10 # max 1000. 
+no_pages<-ceiling(no_records/my_limit)
+
+## Define offseet.
+# You can use the following loop:
+# to calculate the offset automatically based on 
+# on the number of pages needed.
+my_offset<-0 # start in 0. Leave like this
+for(i in 1:no_pages){ # loop to 
+  my_offset<-c(my_offset, my_limit*i)
+}
+
+# Or you can define the offset manually 
+# uncommenting the following line
+# and commenting the loop above:
+# my_offset<-c(#manually define your vector) 
+
+## create object where to store downloaded data. Leave empty
+data<-character()
+
+##
+## Loop to download the data from GraphQL using pagination
+## 
+
+for(i in c(1:length(my_offset))){
+
+# Define pagination
+pagination <- paste0("limit:", my_limit, ", offset:", my_offset[i])
+
+# Define query looping through desired pagination:
+my_query<- paste0("{
+  rivers(pagination:{", pagination, "}){
+      river_id
+      name
+      length
+   }
+   } 
+   ")
+
+
+
+# Get data and add it to the already created df
+data<-rbind(data, get_from_graphQL(query=my_query, url="https://zendro.conabio.gob.mx/api/graphql"))
+
+#end of loop
+}
+

As a result you will get all the data in a single df:

+
head(data)
+
+ +
+
summary(data)
+
##    river_id             name               length      
+##  Length:50          Length:50          Min.   :  65.0  
+##  Class :character   Class :character   1st Qu.: 150.0  
+##  Mode  :character   Mode  :character   Median : 283.0  
+##                                        Mean   : 347.1  
+##                                        3rd Qu.: 402.5  
+##                                        Max.   :1521.0  
+##                                        NA's   :6
+
+
+

get_from_graphQL() explained step by step

+

The following is a step-by-step example explaining with more detail +how does the function get_from_graphQL() that we used above +works.

+

First, once you have a GraphQL query working, you’ll need to save it +to an R object as a character vector:

+
my_query<- "{
+rivers(pagination:{limit:10, offset:0}){
+      river_id
+      name
+      length
+   }
 }
-'                                                                     # write the query as a string
-result <- POST(url, body = list(query=accessions_query))              # fetch data
-

The result that we are getting is the http response. We still need to extract the data in order to be able to manipulate it. If everything went well, the http response will contain an attribute data which will itself contain an attribute named as the query, in this case accessions.

-
jsonResult <- content(result, as = "text") 
-readableText <- fromJSON(jsonResult)
-readableText$data$accessions
+" +

Next, define as another character vector the url of the API, which is +the same of the GraphiQL web interface you explored above:

+
url<-"https://zendro.conabio.gob.mx/api/graphql"
+

Now we can a query to the API by using a POST request:

+
# query server
+result <- POST(url, body = list(query=my_query), encode = c("json"))
+

The result that we are getting is the http response. +Before checking if we got the data, it is good practice to verify if the +connection was successful by checking the status code. A +200 means that all went well. Any other code means +problems. See this.

+
# check server response
+result$status_code
+
## [1] 200
+

We now need to extract the data in order to be able to manipulate it. +If everything went well, the http response will contain an +attribute data which will itself contain an attribute named as the +query, in this case rivers.

+
result
+
## Response [https://zendro.conabio.gob.mx/api/graphql]
+##   Date: 2022-07-27 23:15
+##   Status: 200
+##   Content-Type: application/json; charset=utf-8
+##   Size: 983 B
+## {
+##   "data": {
+##     "rivers": [
+##       {
+##         "river_id": "1",
+##         "name": "Acaponeta",
+##         "length": 233
+##       },
+##       {
+##         "river_id": "10",
+## ...
+

If the query is not written properly or if there is any other error, +the attribute data won’t exist and instead we will get the +attribute erros listing the errors found.

+

If all wen’t well we can proceed to extract the content of the +results with:

+
# get data from query result
+jsonResult <- content(result, as = "text") 
+

The result will be in json format, which we can convert into an +Robjet (list). In this list the results are within each type used in the +query. The argumment flatten is used to collapse the list +into a single data frame the data from different types.

+
# transform to json
+readableResult <- fromJSON(jsonResult, 
+                         flatten = T)
+

Extract data:

+
# get data
+data<-as.data.frame(readableResult$data[1]) 
+head(data)
-

With the above code we were able to visualize the data fetched from zendro server. Next we will put this data in a table form in order to be able to manipulate the data.

-
dataTable <- lapply(readableText[[1]], as.data.table) 
-dataTable$accessions
+

By default, the name of each type will be added a the beggining of +each column name:

+
colnames(data)
+
## [1] "rivers.river_id" "rivers.name"     "rivers.length"
+

To keep only the name of the variable as it is in the original +data:

+
x<-str_match(colnames(data), "\\w*$")[,1] # matches word characters (ie not the ".") at the end of the string
+colnames(data)<-x # assing new colnames 
+

So finally we have the data in a single nice looking data frame:

+
head(data) 
+

Notice that you will get a dataframe like teh one above only for one +to one associations, but than in other cases you still will get +variables that are a list, which you can process in a separate step.

@@ -1700,7 +1994,7 @@

QUERIES EXAMPLES

// add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { - $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); + $('tr.odd').parent('tbody').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); @@ -1718,7 +2012,7 @@

QUERIES EXAMPLES

$(document).ready(function () { $('.tabset-dropdown > .nav-tabs > li').click(function () { - $(this).parent().toggleClass('nav-tabs-open') + $(this).parent().toggleClass('nav-tabs-open'); }); }); @@ -1737,4 +2031,4 @@

QUERIES EXAMPLES

- + \ No newline at end of file diff --git a/index.md b/index.md index 46241078..786c2106 100644 --- a/index.md +++ b/index.md @@ -4,7 +4,7 @@ layout: home nav_order: 1 --- -![Zendro logo](figures/Zendro_logo_horizontal.png) +![Zendro logo](./figures/Zendro_logo_horizontal.png) # Zendro @@ -13,41 +13,70 @@ Zendro is a software tool to quickly create a data warehouse tailored to your sp Zendro consists of two main components, backend and frontend. The backend component has its [base project](https://github.com/ScienceDb/graphql-server) and a [code generator](https://github.com/ScienceDb/graphql-server-model-codegen). The frontend of SPA (Single Page Application) also has its [base project](https://github.com/ScienceDb/single-page-app). See the guides below on how to use Zendro. -Also find Zendro-dev on [github](https://github.com/Zendro-dev). +To see or contribute to our code please visit Zendro-dev on [github](https://github.com/Zendro-dev), where you can find the repositories for: -If you have any questions or comments, please don't hesitate to contact us via an issue [here](https://github.com/Zendro-dev/Zendro-dev.github.io/issues). Tag your issue as a question and we will try to answer as quick as possible. +* [GraphQL server](https://github.com/ScienceDb/graphql-server) +* [GraphQL server model generator](https://github.com/ScienceDb/graphql-server-model-codegen) +* [Single page application](https://github.com/ScienceDb/single-page-app) + +If you have any questions or comments, please don't hesitate to contact us via an issue [here](https://github.com/Zendro-dev/Zendro-dev.github.io/issues). Tag your issue as a question or bug and we will try to answer as quick as possible. + +## SHOW ME HOW IT LOOKS! + +Would you like to see Zendro in action before deciding to learn more? That's fine! We set up a Dummy Zendro Instance for you to explore [Zendro's graphical user interface](https://zendro.conabio.gob.mx/spa) and [Zendro's API](https://zendro.conabio.gob.mx/graphiql). The tutorials on how to [use Zendro day to day](https://zendro-dev.github.io/#using-zendro-day-to-day) of the section below use this instance, so go there to start exploring. + +### Installation and sysadmin + +To start trying Zendro you can try the [Quickstart tutorial](https://zendro-dev.github.io/quickstart.html) on how to create a new Zendro project with pre-defined datamodels, database and environment variables. Then you can try the [Getting started tutorial](https://zendro-dev.github.io/setup_root.html), a step-by-step guide on how to create a new Zendro project from scratch, aimed at software developers and system administrators. [![Go to Quickstart](./figures/quick.png)]({% link quickstart.md %}) [![Go to Getting started guide](./figures/gettingstarted.png)]({% link setup_root.md %}) +For more sysadmin details also check: ### HOW-TO GUIDES: -* [How to define data models: for developers]({% link setup_data_scheme.md %}). Detailed technical specifications on how to define data models for Zendro, aimed at software developers and system administrators. -* [How to define data models: for non-developers]({% link non-developer_documentation.md %}). A brief, illustrated guide of data model specifications, data formatting and data uploading options, aimed at data modelers or managers to facilitate collaboration with developers. -* [How to setup a distributed cloud of zendro nodes]({% link ddm.md %}). A brief guide, aimed at software developers and system administrators, on how to use Zendros distributed-data-models. * [How to use Zendro command line interface (CLI)]({% link zendro_cli.md %}). A tutorial of Zendro CLI, aimed at software developers. -* [How to query and extract data]({% link fromGraphQlToR.html %}). A concise guide on how to use the Zendro API from R to extract data and perform queries, aimed at data managers or data scientists. * [How to setup Authentication / Authorization]({% link oauth.md %}). A concise guide on how to use and setup the Zendro authorization / authentication services. -* [API documentation]({% link api_root.md %}). +* [How to setup a distributed cloud of Zendro nodes]({% link ddm.md %}). A brief guide, aimed at software developers and system administrators, on how to use Zendros distributed-data-models. +* [API documentation]({% link api_root.md %}). A summary of how Zendro backend generator implements a CRUD API that can be accessed through GraphQL query language. -### REPOSITORIES: +### Defining data models +* [How to define data models: for developers]({% link setup_data_scheme.md %}). Detailed technical specifications on how to define data models for Zendro, aimed at software developers and system administrators. +* [How to define data models: for non-developers]({% link what_are_data_models.md %}). A brief, illustrated guide of data model specifications, data formatting and data uploading options, aimed at data modelers or managers to facilitate collaboration with developers. -* [GraphQL server](https://github.com/ScienceDb/graphql-server) -* [GraphQL server model generator](https://github.com/ScienceDb/graphql-server-model-codegen) -* [Single page application](https://github.com/ScienceDb/single-page-app) +### Using Zendro day to day +* [How to use Zendro's graphical interface]({% link SPA_howto.md %}). A full guide on how to use Zendro's graphical point and click interface. Aimed to general users and featuring lots of screenshots. +* [Introduction to GraphQL and querying the API]({% link GraphQL_intro.md %}). A friendly intro to how to perform GraphQL queries and use GraphiQL documentation. +* [How to query and extract data from R]({% link fromGraphQlToR.html %}). A concise guide on how to use the Zendro API from R to extract data and perform queries, aimed at data managers or data +* [How to use the Zendro API with python to make changes to the data]({% link Zendro_requests_with_python.md %}). A concise guide on how to access the API using your user credentials to make CRUD operations on the data using python. + +## Zendro users profiles -### CONTRIBUTIONS +We designed Zendro to be useful for research teams and institutions that include users with different areas of expertise, needs and type of activities. The table below summarizes how we envision that different users will use Zendro: + + +| Profile | Background | Expected use | +|-------------------------------|-----------------|-----------------------| +| General user / scientist | * Very experienced using Excel for data manipulation, visualization and basic data analysis.
* Data producer.
* Extensive domain knowledge.
* No programming experience. | * Access Zendro SPA to see what data has been uploaded.
* CRUD single records through the SPA or add several records through uploading a csv file.
* Download data through the SPA.
* Simple queries through the SPA.
* Download data through the SPA.| +| Data scientist / data analyst | * Experienced using Excel for data manipulation, visualization and basic data analysis.
* Experienced in data manipulation, visualization and analysis through programming languages like Python or R.
* Some experience accessing and using APIs. | * Access Zendro instances to see what data have been uploaded.
* Add and modify single records through the SPA.
* Complex queries through the GraphQL API.
* Download query results through the GraphQL API.
* Connect external apps through the API (e.g. Shiny apps in R).| +| Data manager | * Experienced in data manipulation, visualization and analysis, mainly in programming languages like Python or R, but also in Excel.
* Experienced in data standards.
* Experienced in the management and oversight of an organization’s data lifecycle. | * Design data models (json) for new Zendro instances.
* Transform raw data to csv files according to specified data models and following Zendro’s data format requirements.
* Link users’ needs to technical solutions (ie, facilitate communication between general users or analysts, and developers).
* Download and manipulate data through the API.
* CRUD data through the API, in batch. | +| Sysadmin | * Experienced in system administration and server maintenance.
* Experienced in designing, building and maintaining software and data services.
* Some experience in front-end and back-end development.| * Solve technical problems that arise in specific Zendro instances and Zendro in general.
* Set up and configure new Zendro instances.
* Customize back-end and front-end components for Zendro instances.
* Check and confirm that data models and csv data files satisfy Zendro requirements and adjust them if necessary.
* Upload, delete and update data through the API in batch.
* Zendro maintenance. | + +# CONTRIBUTIONS Zendro is the product of a joint effort between the Forschungszentrum Jülich, Germany and the Comisión Nacional para el Conocimiento y Uso de la Biodiversidad, México, to generate a tool that allows efficiently building data warehouses capable of dealing with diverse data generated by different research groups in the context of the FAIR principles and multidisciplinary projects. The name Zendro comes from the words Zenzontle and Drossel, which are Mexican and German words denoting a mockingbird, a bird capable of “talking” different languages, similar to how Zendro can connect your data warehouse from any programming language or data analysis pipeline. -#### Zendro contributors in alphabetical order -Francisca Acevedo1, Vicente Arriaga1, Katja Dohm3, Constantin Eiteneuer2, Sven Fahrner2, Frank Fischer4, Asis Hallab2, Alicia Mastretta-Yanes1, Roland Pieruschka2, Alejandro Ponce1, Yaxal Ponce2, Francisco Ramírez1, Irene Ramos1, Bernardo Terroba1, Tim Rehberg3, Verónica Suaste1, Björn Usadel2, David Velasco2, Thomas Voecking3 +### Zendro contributors in alphabetical order +Francisca Acevedo1, Vicente Arriaga1, Vivian Bass1, Katja Dohm3, Jaime Donlucas1, Constantin Eiteneuer2, Sven Fahrner2, Frank Fischer4, Asis Hallab2, Alicia Mastretta-Yanes1, Roland Pieruschka2, Erick Palacios-Moreno1, Alejandro Ponce1, Yaxal Ponce2, Francisco Ramírez1, Irene Ramos1, Bernardo Terroba1, Tim Rehberg3, Ulrich Schurr2, Verónica Suaste1, Björn Usadel2, David Velasco2, Thomas Voecking3 and Dan Wang2 -##### Author affiliations +#### Author affiliations 1. CONABIO - Comisión Nacional para el Conocimiento y Uso de la Biodiversidad, México -2. Forschungszentrum Jülich - Germany -3. auticon - www.auticon.com +2. Forschungszentrum Jülich, Germany +3. Auticon - www.auticon.com 4. InterTech - www.intertech.de -#### Zendro author contributions -Asis Hallab and Alicia Mastretta-Yanes coordinated the project. Asis Hallab designed the software. Programming of code generators, the browser based single page application interface, and the GraphQL application programming interface was done by Katja Dohm, Constantin Eiteneuer, Francisco Ramírez, Tim Rehberg, Veronica Suaste, David Velasco, Thomas Voecking, and Dan Wang. Counselling and use case definitions were contributed by Francisca Acevedo, Vicente Arriaga, Frank Fischer, Roland Pieruschka, Alejandro Ponce, Irene Ramos, and Björn Usadel. User experience and application of Zendro on data management projects was carried out by Asis Hallab, Alicia Mastretta-Yanes, Yaxal Ponce, Irene Ramos, Verónica Suaste, and David Velasco. Logo design was made by Bernardo Terroba. +#### Author contributions +Conceptualization, management and coordination of the project was done by Asis Hallab and Alicia Mastretta-Yanes. Software design was done by Asis Hallab. Programming, implementation and testing of the computer code was done by Vivian Bass, Katja Dohm, Constantin Eiteneuer, Asis Hallab, Francisco Ramírez, Tim Rehberg, Veronica Suaste, David Velasco, Thomas Voecking and Dan Wang. Use case definitions were provided by Frank Fischer, Roland Pieruschka, Irene Ramos, and Björn Usadel. Acquisition of the financial support for the project was contributed by Francisca Acevedo, Vicente Arriaga and Björn Usadel. User experience and application of Zendro on data management projects was carried out by Vivian Bass, Jaime Donlucas, Asis Hallab, Alicia Mastretta-Yanes, Erick Palacios Moreno, Alejandro Ponce, Yaxal Ponce, Irene Ramos, Verónica Suaste and David Velasco. Writing the original draft of the manuscript and software documentation was done by Vivian Bass, Constantin Eiteneuer, Asis Hallab, Alicia Mastretta-Yanes, Irene Ramos, Verónica Suaste and Dan Wang. Logo desing was done by Bernardo Terroba. + +### Funding +On the Mexican side, Zendro was developed with resources of the Global Environmental Fund though the Mexican Agrobiodiversity Project (GEF Project ID: 9380). On the German side Zendro was developed with resources of the projects [german grants here]. \ No newline at end of file diff --git a/quickstart.md b/quickstart.md index 2168743c..6ee044d4 100644 --- a/quickstart.md +++ b/quickstart.md @@ -21,7 +21,7 @@ If you want to know more about Zendro or a detailed explanation on how to set up --- ## Project Requirements: - * [NodeJS](https://nodejs.org/en/) version 17+ is required. + * [NodeJS](https://nodejs.org/en/) version 16+ is required. * [docker](https://docs.docker.com/get-docker/) * [docker-compose](https://docs.docker.com/compose/install/#install-compose)

diff --git a/setup_data_scheme.md b/setup_data_scheme.md index 5f3feb98..a35b0717 100644 --- a/setup_data_scheme.md +++ b/setup_data_scheme.md @@ -34,7 +34,7 @@ Name | Type | Description *url* | String | This field is only mandatory for __zendro\_server__ stored models. Indicates the url where the zendro server storing the model is runnning. *attributes* | Object | The key of each entry is the name of the attribute and there are two options for the value . It can be either a string indicating the type of the attribute or an object where the user indicates the type of the attribute(in the _type_ field) together with an attribute's description (in the _description_ field). See the [table](#supported-data-types) below for allowed types. Example of option one: ```{ "attribute1" : "String", "attribute2: "Int" }``` Example of option two: ``` { "attribute1" : {"type" :"String", "description": "Some description"}, "attribute2: "Int ``` *associations* | Object | The key of each entry is the name of the association and the value should be an object describing the corresponding association. See [Associations Spec](#associations-spec) section below for details. -*indices* | [String] | Names of attributes for generating corresponding indices. +*indices* | [String] | Attributes for generating corresponding indices. By default, indices would be generated for *internalId*. And it is recommended to add indices for attributes which are foreign keys. *operatorSet* | String | It is possible to specify the operator set for generic models, distributed adapters and zendro servers. The following operator set are supported: `GenericPrestoSqlOperator`, `MongodbNeo4jOperator`, `CassandraOperator`, `AmazonS3Operator`. See [documentation of operators]({% link api_graphql.md %}#operators) for details. *internalId* | String | This string corresponds to the name of the attribute that uniquely identifies a record. If this field is not specified, an _id_, default attribute, will be added. *spaSearchOperator* | 'like' \| 'iLike' | Optional attribute to specify which operator should be used for the single-page-app text search-field. Defaults to iLike diff --git a/setup_root.md b/setup_root.md index c56e4f3e..578e0b9e 100644 --- a/setup_root.md +++ b/setup_root.md @@ -20,7 +20,7 @@ This is a step-by-step guide on how to create a new Zendro project from scratch, Zendro consists of four source-code projects: __graphql-server-model-codegen__, __graphql-server__, __single-page-app__ and __graphiql-auth__. The first pair is responsible for the back-end [GraphQL](https://graphql.org/learn/) service that can be accessed on the default url `http://localhost:3000/graphql`. To pull up the corresponding server it is required to generate some code first. The third project acts as a client of the GraphQL server and creates a simple generic web-based GUI for this server on the url `http://localhost:8080`. The last project offers a Zendro specific implementation of the browser based GraphQL IDE [Graphiql](https://github.com/graphql/graphiql). The project is a simple [Next.js](https://nextjs.org/) application. Custom adjustments have been made to accommodate Zendro requirements for authentication of users and enhanced meta searches using [jq](https://stedolan.github.io/jq/) or [JSONPath](https://goessner.net/articles/JsonPath/) statements. ## Project Requirements: - * [NodeJS](https://nodejs.org/en/) version 17+ is required. + * [NodeJS](https://nodejs.org/en/) version 16+ is required. **recommended for setting up zendro using docker** * [docker](https://docs.docker.com/get-docker/) @@ -70,9 +70,11 @@ If you wish to know more about enviroment variables you can check [this]({% link ### Step 4: Define your data models -Add your model definitions in JSON files to `./data_model_definitions` folder. +Add your model definitions in JSON files to `./data_model_definitions` folder. -If you want to learn more about how to define data models with Zendro, please check [this]({% link setup_data_scheme.md %}). +If you want to learn more about how to define data models with Zendro, please check [this]({% link setup_data_scheme.md %}). + +Note: by default, indices would be generated for *internalId*. And it is recommended to add indices for attributes which are foreign keys. See the [json specs]({% link setup_data_scheme.md %}#json-specs) for more information. ### Step 5: Generate code and migrations @@ -313,3 +315,7 @@ A couple of basic extensions are suggested to be introduced directly into the Gr Furthermore, the whole codebase used to run zendro is exposed and can be directly customized if needed. That is true for the graphql-server as well as the frontend applications. [ > Advanced code customizing]({% link setup_customize.md %}) + +### Add empty or default plots (optional) + +Empty or default plots could be generated via zendro CLI. Please see the instruction [here]({% link zendro_cli.md %}#plots) \ No newline at end of file diff --git a/usage.md b/usage.md new file mode 100644 index 00000000..387a0938 --- /dev/null +++ b/usage.md @@ -0,0 +1,11 @@ +--- +layout: default +title: Usage +nav_order: 4 +has_children: true +--- + +# Usage +{: .no_toc } + +This guide is aimed at data modelers, data managers and other data users to facilitate collaboration with developers in designing Zendro-generated database systems. We assume that as a data modeler or manager, you might be responsible for structuring your database, and work with a developer or system administrator to set up the project. In this guide we describe and illustrate the requirements for data models, which are the main input for Zendro, and then follow up to describe data uploading options. If you want to dive deeper into the installation process from scratch, see this [tutorial]({% link setup_root.md %}) on how to set up a new project. \ No newline at end of file diff --git a/non-developer_documentation.md b/what_are_data_models.md similarity index 91% rename from non-developer_documentation.md rename to what_are_data_models.md index 608a8d23..3c860d2a 100644 --- a/non-developer_documentation.md +++ b/what_are_data_models.md @@ -1,14 +1,13 @@ --- layout: default -title: Doc for non-developer -nav_order: 4 +title: What are data models? +parent: Usage +nav_order: 1 +permalink: /usage/data_models --- - -# Doc for non-developer +# What are data models? {: .no_toc } -This guide is aimed at data modelers, data managers and other data users to facilitate collaboration with developers in designing Zendro-generated database systems. We assume that as a data modeler or manager, you might be responsible for structuring your database, and work with a developer or system administrator to set up the project. In this guide we describe and illustrate the requirements for data models, which are the main input for Zendro, and then follow up to describe data uploading options. If you want to dive deeper into the installation process from scratch, see this [tutorial]({% link setup_root.md %}) on how to set up a new project. - ## Table of contents {: .no_toc .text-delta } @@ -25,13 +24,13 @@ Zendro takes as input a set of data models described in JSON files, from which i Let's assume we want to create a small database for a herbarium of medicinal plants. We need data models for specimens, taxonomic information, collection information, and uses (Figure 1). In our case, the taxonomic information is specific to each plant, so there is a One-to-One association between specimen and taxon. Each plant can have many uses, just like there are many plants that can serve the same function, so the association between specimen and uses is Many-to-Many. Finally, a specimen belongs to one collection only, but a collection may store multiple specimens, so there is a One-to-Many association between these models (Figure 1). -![Figure 1](figures/figure1.png) +![Figure 1](../figures/figure1.png) In this example, information about specimens, taxonomic data and uses is stored in a local server. But we will assume that the information about collections is in a remote database, perhaps one that holds other types of plants, and we would connect to that database to access only the attributes we are interested in. Next, we need to list the attributes of each model and their data types (Figure 2). Allowed data types are: String, Int, Float, Boolean, [Date, Time and DateTime](https://github.com/excitement-engineer/graphql-iso-date/blob/HEAD/rfc3339.txt). Each model also requires an attribute that serves as the *primary key* or unique identifier of each record. -![Figure 2](figures/figure2.png) +![Figure 2](../figures/figure2.png) Foreign keys are also needed to establish the associations to other data models; their location depends on the association type. Note that in Many-to-Many associations, it is necessary to define an additional model for the cross table of foreign-key pairs that define the association. For more details, please read the documentation on associations in [Sequelize](https://sequelize.org/master/manual/assocs.html), on which Zendro is based. diff --git a/zendro_cli.md b/zendro_cli.md index d0ecf2a3..dadcda20 100644 --- a/zendro_cli.md +++ b/zendro_cli.md @@ -143,6 +143,18 @@ zendro bulk-create -s, --sheet_name: Sheet name for XLSX file. By default process the first sheet. -r, --remote_server: Upload to a remote server (default: false). ``` + +### Download records +``` +zendro bulk-download + + Usage: zendro bulk-download [options] + + Options: + -f, --file_path: File path. + -n, --model_name: Model name. + -r, --remote_server: Download from a remote server (default: false). +``` ### Set up a quick sandbox ``` zendro set-up @@ -153,6 +165,25 @@ zendro set-up -d, --dockerize: Keep Docker files (default: false). ``` +### Create empty or default plots +``` +zendro create-plot + + Usage: zendro create-plot [options] + + Options: + -p, --default_plots: Create default plots (default: false). + -f, --plot_name: Customized plot name. + -t, --type: The visualization library (options: "plotly", "d3"). + -m, --menu: The location of the plot menu (options: "none", "top", "left"). + -n, --menu_item_name: The item name in the plot menu (default value is the plot name). +``` +Hints: +The meaning of options in "menu" (-m): +* "top": the navigation of the plot would be located in the top menu bar with the menu item name +* "left": the sub-menu for the plot would be generated in the left menu with the menu item name +* "none": no navigation + ## A Quick Example for setting up a Zendro Sandbox Please go to [Quickstart]({% link quickstart.md %}) guide to set up a Zendro Sandbox. @@ -223,3 +254,35 @@ In general, it is possible to download all data into CSV format in two ways, eit 1. If the Zendro instance is installed locally, then user can execute the command in the `graphql-server` folder: `zendro bulk-download -f -n `. To configure delimiters (`ARRAY_DELIMITER`, `FIELD_DELIMITER` and `RECORD_DELIMITER`) and record-limit (`LIMIT_RECORDS`), set the according environment variables in `graphql-server/.env` 2. If the Zendro instance is accessible remotely, modify the `zendro/.env.migration` configuration file to map to the remote Zendro instance. After that, execute `zendro bulk-create -f -n -r` to download the records to CSV. + +## Plots +It is possible to generate default or empty plots via CLI. +### Create empty plots +When a user wants to create a empty plot, the plot name (-f) and the visualization library (-t) must be specified. Other options for the location of the plot menu (-m) and the item name in the plot menu (-n) are optional. For example, an empty plot named `barchart` with `plotly` library at the top menu bar could be generated by executing the following command in the `single-page-app` folder: +``` +zendro create-plot -f barchart -t plotly -m top +``` +Then user can customize the data processing in the file `single-page-app/src/pages/barchart.tsx`. In the `fetchData` function, user can pass the required query as argument in the `zendro.request` function, process the response `res` into a desired format and set that into `data` variable. And the user can pass the `data` as a parameter of `` component. Besides, the layout and other plot parameters could be set in the `single-page-app/src/zendro/plots/barchart.tsx`. + +Similarly, if the user wants to generate a plot named `circle` with `d3` library at the left menu, the following command should be executed in the `single-page-app` folder: +``` +zendro create-plot -f circle -t d3 -m left +``` +And the processed `data` could be passed as a parameter for the `` component in the file `single-page-app/src/pages/circle.tsx`. Meanwhile, the plot could be customized in the corresponding `single-page-app/src/zendro/plots/circle.tsx` file. + +### Generate default plots +By executing the following command in the `single-page-app` folder, three default plots would be generated: +``` +zendro create-plot -p +``` +1. scatter-plot + +Only numerical attributes could be selected in the scatter-plot. And the attribute for Y-axis must be specified for the plot. If the attribute for X-axis has not been selected, then values in Y-axis would be used for generating a plot. Besides, different modes for the plot could be selected, namely `lines`, `markers`, `lines+markers`. + +2. rain-cloud-plot + +Only numerical attributes could be selected in the rain-cloud-plot. It is possible to visualize multiple attributes within one plot. So the user needs to specify `The number of numerical attributes`, then corresponding selectors would be rendered. And the user can select necessary attributes in different data models. Besides, user can choose the `Direction` of the plot. Namely, the plot could be rendered horizontally (default) or vertically. Moreover, the user can specify the `Tickangle` and use the `Autoscale` button in the plot for a better alignment effect. Apart from that, the `Span mode` (default: soft) could be selected for the plot. And the detailed explanation for the span mode could be found [here](https://plotly.com/javascript/reference/violin/#violin-spanmode). + +3. boxplot + +The setup of a boxplot is very similar to a rain-cloud-plot. And there is no `Span mode` for generating a boxplot.