Merge pull request #1605 from microsoft/staging

Staging to master for making release
recommenders-team · Jan 12, 2022 · 652d101 · 652d101
2 parents dce8f71 + 7af6edd
commit 652d101
Show file tree

Hide file tree

Showing 14 changed files with 94 additions and 64 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,8 @@
 # What's New
 
+## Update January 13, 2022
+
+We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines.
 
 ## Update September 27, 2021
 
@@ -13,7 +16,6 @@ We have also added new evaluation metrics: _novelty, serendipity, diversity and
 
 Code coverage reports are now generated for every PR, using [Codecov](https://about.codecov.io/).
 
-
 ## Update June 21, 2021
 
 We have a new release [Recommenders 0.6.0](https://github.com/microsoft/recommenders/releases/tag/0.6.0)!

diff --git a/README.md b/README.md
@@ -2,18 +2,15 @@
 
 [![Documentation Status](https://readthedocs.org/projects/microsoft-recommenders/badge/?version=latest)](https://microsoft-recommenders.readthedocs.io/en/latest/?badge=latest)
 
-## What's New (September 27, 2021)
+## What's New (January 13, 2022)
 
-We have a new release [Recommenders 0.7.0](https://github.com/microsoft/recommenders/releases/tag/0.7.0)!
+We have a new release [Recommenders 1.0.0](https://github.com/microsoft/recommenders/releases/tag/1.0.0)! The codebase has now migrated to TensorFlow versions 2.6 / 2.7 and to Spark version 3. In addition, there are a few changes in the dependencies and extras installed by `pip` (see [this guide](recommenders/README.md#optional-dependencies)). We have also made improvements in the code and the CI / CD pipelines.
 
-In this, we have changed the names of the folders which contain the source code, so that they are more informative. This implies that you will need to change any import statements that reference the recommenders package. Specifically, the folder `reco_utils` has been renamed to `recommenders` and its subfolders have been renamed according to [issue 1390](https://github.com/microsoft/recommenders/issues/1390).  
+Starting with release 0.6.0, Recommenders has been available on PyPI and can be installed using pip! 
 
-The recommenders package now supports three types of environments: [venv](https://docs.python.org/3/library/venv.html), [virtualenv](https://virtualenv.pypa.io/en/latest/index.html#) and [conda](https://docs.conda.io/projects/conda/en/latest/glossary.html?highlight=environment#conda-environment) with Python versions 3.6 and 3.7.
-
-We have also added new evaluation metrics: _novelty, serendipity, diversity and coverage_ (see the [evalution notebooks](examples/03_evaluate/README.md)).
-
-Code coverage reports are now generated for every PR, using [Codecov](https://about.codecov.io/).
+Here you can find the PyPi page: https://pypi.org/project/recommenders/
 
+Here you can find the package documentation: https://microsoft-recommenders.readthedocs.io/en/latest/
 
 ## Introduction
 
@@ -40,41 +37,51 @@ and currently does not support version 3.8 and above. It is recommended to insta
 
 To set up on your local machine:
 
-To install core utilities, CPU-based algorithms, and dependencies:
+* To install core utilities, CPU-based algorithms, and dependencies:
+
+    1. Ensure software required for compilation and Python libraries
+       is installed.
+
+       + On Linux this can be supported by adding:
+
+         ```bash
+         sudo apt-get install -y build-essential libpython<version>
+         ``` 
+
+         where `<version>` should be `3.6` or `3.7` as appropriate.
 
-1. Ensure software required for compilation and Python libraries is installed. On Linux this can be supported by adding:
-```bash
-sudo apt-get install -y build-essential libpython<version>
-``` 
-where `<version>` should be `3.6` or `3.7` as appropriate.
+       + On Windows you will need [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/).
 
-On Windows you will need [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/).
-
-2. Create a conda or virtual environment. See the [setup guide](SETUP.md) for more details.
+    2. Create a conda or virtual environment.  See the
+       [setup guide](SETUP.md) for more details.
 
-3. Within the created environment, install the package from [PyPI](https://pypi.org):
+    3. Within the created environment, install the package from
+       [PyPI](https://pypi.org):
 
-```bash
-pip install --upgrade pip
-pip install --upgrade setuptools
-pip install recommenders[examples]
-```
+       ```bash
+       pip install --upgrade pip
+       pip install --upgrade setuptools
+       pip install recommenders[examples]
+       ```
 
-4. Register your (conda or virtual) environment with Jupyter:
+    4. Register your (conda or virtual) environment with Jupyter:
 
-```bash
-python -m ipykernel install --user --name my_environment_name --display-name "Python (reco)"
-```
+       ```bash
+       python -m ipykernel install --user --name my_environment_name --display-name "Python (reco)"
+       ```
 
-5. Start the Jupyter notebook server
+    5. Start the Jupyter notebook server
 
-```bash
-jupyter notebook
-```
+       ```bash
+       jupyter notebook
+       ```
 
-6. Run the [SAR Python CPU MovieLens](examples/00_quick_start/sar_movielens.ipynb) notebook under the `00_quick_start` folder. Make sure to change the kernel to "Python (reco)".
+    6. Run the [SAR Python CPU MovieLens](examples/00_quick_start/sar_movielens.ipynb)
+       notebook under the `00_quick_start` folder.  Make sure to
+       change the kernel to "Python (reco)".
 
-For additional options to install the package (support for GPU, Spark etc.) see [this guide](recommenders/README.md).
+* For additional options to install the package (support for GPU,
+  Spark etc.) see [this guide](recommenders/README.md).
 
 **NOTE** - The [Alternating Least Squares (ALS)](examples/00_quick_start/als_movielens.ipynb) notebooks require a PySpark environment to run. Please follow the steps in the [setup guide](SETUP.md#dependencies-setup) to run these notebooks in a PySpark environment. For the deep learning algorithms, it is recommended to use a GPU machine and to follow the steps in the [setup guide](SETUP.md#dependencies-setup) to set up Nvidia libraries.
 

diff --git a/SETUP.md b/SETUP.md
@@ -6,7 +6,6 @@ This document describes how to setup all the dependencies to run the notebooks i
 * [Azure Databricks](https://azure.microsoft.com/en-us/services/databricks/)
 * Docker container
 
-
 ## Table of Contents
 
   - [Compute environments](#compute-environments)
@@ -397,7 +396,7 @@ You can then open the Jupyter notebook server at http://localhost:8888
 
 The process of making a new release and publishing it to pypi is as follows:
 
-First make sure that the tag that you want to add, e.g. `0.6.0`, is added in [recommenders.py/__init__.py](recommenders.py/__init__.py). Follow the [contribution guideline](CONTRIBUTING.md) to add the change.
+First make sure that the tag that you want to add, e.g. `0.6.0`, is added in [`recommenders.py/__init__.py`](recommenders.py/__init__.py). Follow the [contribution guideline](CONTRIBUTING.md) to add the change.
 
 1. Make sure that the code in main passes all the tests (unit and nightly tests).
 1. Create a tag with the version number: e.g. `git tag -a 0.6.0 -m "Recommenders 0.6.0"`.
@@ -406,4 +405,5 @@ First make sure that the tag that you want to add, e.g. `0.6.0`, is added in [re
 generates a wheel and a tar.gz which are uploaded to a [GitHub draft release](https://github.com/microsoft/recommenders/releases).
 1. Fill up the draft release with all the recent changes in the code.
 1. Download the wheel and tar.gz locally, these files shouldn't have any bug, since they passed all the tests.
+1. Install twine: `pip install twine`
 1. Publish the wheel and tar.gz to pypi: `twine upload recommenders*`
diff --git a/docs/README.md b/docs/README.md
@@ -6,7 +6,8 @@ To setup the documentation, first you need to install the dependencies of the fu
     conda activate reco_full
 
     pip install numpy cython
-    pip install --no-binary scikit-surprise .[all,experimental]
+    pip install --no-binary scikit-surprise "scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz"
+    pip install "pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip"
     pip install sphinx_rtd_theme
 
 

diff --git a/examples/00_quick_start/tfidf_covid.ipynb b/examples/00_quick_start/tfidf_covid.ipynb
@@ -16,7 +16,7 @@
     "# TF-IDF Content-Based Recommendation on the COVID-19 Open Research Dataset\n",
     "This demonstrates a simple implementation of Term Frequency Inverse Document Frequency (TF-IDF) content-based recommendation on the [COVID-19 Open Research Dataset](https://azure.microsoft.com/en-us/services/open-datasets/catalog/covid-19-open-research/), hosted through Azure Open Datasets.\n",
     "\n",
-    "In this notebook, we will create a recommender which will return the top k recommended articles similar to any article of interest (query item) in the COVID-19 Open Reserach Dataset."
+    "In this notebook, we will create a recommender which will return the top k recommended articles similar to any article of interest (query item) in the COVID-19 Open Research Dataset."
    ]
   },
   {
@@ -1229,4 +1229,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
+}
diff --git a/recommenders/README.md b/recommenders/README.md
@@ -35,7 +35,7 @@ By default `recommenders` does not install all dependencies used throughout the
 - experimental: current experimental dependencies that are being evaluated (e.g. libraries that require advanced build requirements or might conflict with libraries from other options)
 - nni: dependencies for NNI tuning framework.
 
-Note that, currently, xLearn, Surprise and Vowpal Wabbit are in the experimental group.
+Note that, currently, xLearn and Vowpal Wabbit are in the experimental group.
 
 These groups can be installed alone or in combination:
 ```bash
@@ -64,10 +64,16 @@ When installing with GPU support you will need to point to the PyTorch index to
 
 We are currently evaluating inclusion of the following dependencies:
 
- - scikit-surprise: due to incompatibilities with `numpy <= 1.19`, proper installation of Surprise requires `pip install numpy cython` and `pip install --no-binary scikit-surprise recommenders[experimental]`
  - vowpalwabbit: current examples show how to use vowpal wabbit after it has been installed on the command line; using the [PyPI package](https://pypi.org/project/vowpalwabbit/) with the scikit-learn interface will facilitate easier integration into python environments
  - xlearn: on some platforms, xLearn requires pre-installation of cmake.
 
+## Other dependencies
+
+Some dependencies are not available via the recommenders PyPI package, but can be installed in the following ways: 
+ - scikit-surprise: due to incompatibilities with `numpy <= 1.19`, proper installation of Surprise requires `pip install numpy cython` and `pip install --no-binary scikit-surprise "scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz"`
+ - pymanopt: this dependency is required for the RLRMC and GeoIMC algorithms; a version of this code compatible with TensorFlow 2 can be
+ installed with `pip install "pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip"`. 
+
 ## NNI dependencies
 
 For NNI a more recent version can be installed but is untested.

diff --git a/setup.py b/setup.py
@@ -39,8 +39,6 @@
     "memory_profiler>=0.54.0,<1",
     "nltk>=3.4,<4",
     "pydocumentdb>=2.3.3<3",  # TODO: replace with azure-cosmos
-    # Temporary fix for pymanopt, only this commit works with TF2
-    "pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip",
     "seaborn>=0.8.1,<1",
     "transformers>=2.5.0,<5",
     "bottleneck>=1.2.1,<2",
@@ -93,9 +91,6 @@
 extras_require["experimental"] = [
     # xlearn requires cmake to be pre-installed
     "xlearn==0.40a1",
-    # Surprise needs to be built from source because of the numpy <= 1.19 incompatibility
-    # Requires pip to be run with the --no-binary option
-    "scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz",
     # VW C++ binary needs to be installed manually for some code to work
     "vowpalwabbit>=8.9.0,<9",
 ]
@@ -104,6 +99,12 @@
     "nni==1.5",
 ]
 
+# The following dependencies can be installed as below, however PyPI does not allow direct URLs.
+# Surprise needs to be built from source because of the numpy <= 1.19 incompatibility
+# Requires pip to be run with the --no-binary option
+# "scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz",
+# Temporary fix for pymanopt, only this commit works with TF2
+# "pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip",
 
 setup(
     name="recommenders",

diff --git a/tests/ci/azure_pipeline_test/dsvm_nightly_linux_cpu.yml b/tests/ci/azure_pipeline_test/dsvm_nightly_linux_cpu.yml
@@ -33,6 +33,6 @@ extends:
     timeout: 180 
     conda_env: "nightly_linux_cpu"
     conda_opts: "python=3.6"
-    pip_opts: "[examples,dev,experimental] --no-cache --no-binary scikit-surprise"
+    pip_opts: "[examples,dev,experimental] 'scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz' 'pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip' --no-cache --no-binary scikit-surprise"
     pytest_markers: "not spark and not gpu"
     pytest_params: "-x"
diff --git a/tests/ci/azure_pipeline_test/dsvm_notebook_linux_cpu.yml b/tests/ci/azure_pipeline_test/dsvm_notebook_linux_cpu.yml
@@ -60,5 +60,5 @@ extends:
     task_name: "Test - Unit Notebook Linux CPU"
     conda_env: "unit_notebook_linux_cpu"
     conda_opts: "python=3.6"
-    pip_opts: "[examples,dev,experimental] --no-cache --no-binary scikit-surprise"
+    pip_opts: "[examples,dev,experimental] 'scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz' 'pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip' --no-cache --no-binary scikit-surprise"
     pytest_markers: "notebooks and not spark and not gpu"
diff --git a/tests/ci/azure_pipeline_test/dsvm_unit_linux_cpu.yml b/tests/ci/azure_pipeline_test/dsvm_unit_linux_cpu.yml
@@ -60,5 +60,5 @@ extends:
     task_name: "Test - Unit Linux CPU"
     conda_env: "unit_linux_cpu"
     conda_opts: "python=3.6"
-    pip_opts: "[dev,experimental] --no-cache --no-binary scikit-surprise"
+    pip_opts: "[dev,experimental] 'scikit-surprise@https://github.com/NicolasHug/Surprise/archive/refs/tags/v1.1.1.tar.gz' 'pymanopt@https://github.com/pymanopt/pymanopt/archive/fb36a272cdeecb21992cfd9271eb82baafeb316d.zip' --no-cache --no-binary scikit-surprise"
     pytest_markers: "not notebooks and not spark and not gpu"
diff --git a/tests/integration/examples/test_notebooks_python.py b/tests/integration/examples/test_notebooks_python.py
@@ -236,6 +236,7 @@ def test_cornac_bpr_integration(
 
 
 @pytest.mark.integration
+@pytest.mark.experimental
 @pytest.mark.parametrize(
     "expected_values",
     [({"rmse": 0.4969, "mae": 0.4761})],

diff --git a/tests/unit/examples/test_notebooks_python.py b/tests/unit/examples/test_notebooks_python.py
@@ -103,6 +103,7 @@ def test_wikidata_runs(notebooks, output_notebook, kernel_name, tmp):
     )
 
 
+@pytest.mark.experimental
 @pytest.mark.notebooks
 def test_rlrmc_quickstart_runs(notebooks, output_notebook, kernel_name):
     notebook_path = notebooks["rlrmc_quickstart"]

diff --git a/tests/unit/recommenders/models/test_geoimc.py b/tests/unit/recommenders/models/test_geoimc.py
@@ -1,20 +1,23 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT License.
 
-import collections
-import pytest
-import numpy as np
-from scipy.sparse import csr_matrix
-
-from recommenders.models.geoimc.geoimc_data import DataPtr
-from recommenders.models.geoimc.geoimc_predict import Inferer
-from recommenders.models.geoimc.geoimc_algorithm import IMCProblem
-from recommenders.models.geoimc.geoimc_utils import (
-    length_normalize,
-    mean_center,
-    reduce_dims,
-)
-from pymanopt.manifolds import Stiefel, SymmetricPositiveDefinite
+try:
+    import collections
+    import pytest
+    import numpy as np
+    from scipy.sparse import csr_matrix
+
+    from recommenders.models.geoimc.geoimc_data import DataPtr
+    from recommenders.models.geoimc.geoimc_predict import Inferer
+    from recommenders.models.geoimc.geoimc_algorithm import IMCProblem
+    from recommenders.models.geoimc.geoimc_utils import (
+        length_normalize,
+        mean_center,
+        reduce_dims,
+    )
+    from pymanopt.manifolds import Stiefel, SymmetricPositiveDefinite
+except:
+    pass    # skip if pymanopt not installed
 
 _IMC_TEST_DATA = [
     (
@@ -35,6 +38,7 @@
 
 
 # `geoimc_data` tests
+@pytest.mark.experimental
 @pytest.mark.parametrize("data, entities", _IMC_TEST_DATA)
 def test_dataptr(data, entities):
     ptr = DataPtr(data, entities)
@@ -44,6 +48,7 @@ def test_dataptr(data, entities):
 
 
 # `geoimc_utils` tests
+@pytest.mark.experimental
 @pytest.mark.parametrize(
     "matrix",
     [
@@ -59,6 +64,7 @@ def test_length_normalize(matrix):
     )
 
 
+@pytest.mark.experimental
 @pytest.mark.parametrize(
     "matrix",
     [
@@ -73,19 +79,22 @@ def test_mean_center(matrix):
     )
 
 
+@pytest.mark.experimental
 def test_reduce_dims():
     matrix = np.random.rand(100, 100)
     assert reduce_dims(matrix, 50).shape[1] == 50
 
 
 # `geoimc_algorithm` tests
+@pytest.mark.experimental
 @pytest.mark.parametrize(
     "dataPtr, rank",
     [
         (DataPtr(_IMC_TEST_DATA[0][0], _IMC_TEST_DATA[0][1]), 3),
         (DataPtr(_IMC_TEST_DATA[1][0], _IMC_TEST_DATA[1][1]), 3),
     ],
 )
+@pytest.mark.experimental
 def test_imcproblem(dataPtr, rank):
 
     # Test init
@@ -110,10 +119,12 @@ def test_imcproblem(dataPtr, rank):
 
 
 # `geoimc_predict` tests
+@pytest.mark.experimental
 def test_inferer_init():
     assert Inferer(method="dot").method.__name__ == "PlainScalarProduct"
 
 
+@pytest.mark.experimental
 @pytest.mark.parametrize(
     "dataPtr",
     [

diff --git a/tests/unit/recommenders/models/test_surprise_utils.py b/tests/unit/recommenders/models/test_surprise_utils.py
@@ -17,7 +17,7 @@
         compute_ranking_predictions,
     )
 except:
-    pass    # skip if experimental not installed
+    pass    # skip if surprise not installed
 
 TOL = 0.001