Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on MyBinder #1

Open
psychemedia opened this issue Dec 5, 2018 · 5 comments
Open

Running on MyBinder #1

psychemedia opened this issue Dec 5, 2018 · 5 comments

Comments

@psychemedia
Copy link

psychemedia commented Dec 5, 2018

To simplify the process of allowing folk to run the demo notebooks, you could set the repo up so that it runs on MyBinder.

For example, at the top level of the repo, create a requirements.txt file containing:

tables
pandas
matplotlib
scikit-learn
nltk==3.2.5
gensim
networkx
lxml

cufflinks
wordcloud
pyvis

and a postBuild file containining:

python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt')"
python -c "import nltk; nltk.download('vader_lexicon'); nltk.download('wordnet')"

(I may have missed some other dependencies.)

Then you should be able to launch a container and make a start running the notebooks from MyBinder: https://mybinder.org/v2/gh/yhilpisch/dnanlp/master

In a quick test, I noticed that the notebook 02_nlp/02_nlp_openie.ipynb has an import that I've missed the dependency for?

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-16-a01c2d57ef10> in <module>()
      2 sys.path.append('../../../')
      3 sys.path.append('../../modules/')
----> 4 import soiepy.main as ie

ModuleNotFoundError: No module named 'soiepy'
@yhilpisch
Copy link
Owner

yhilpisch commented Dec 5, 2018 via email

@psychemedia
Copy link
Author

psychemedia commented Dec 5, 2018

Java can be installed as well... e.g. create an apt.txt file and add something like:

openjdk-8-jre

(or whatever version) to have it installed.

Any explicit Linux command line steps you need to run to get / install unpackaged items can be added to the postBuild file.

@psychemedia
Copy link
Author

I made a start here: https://github.com/ouseful-PR/dnanlp

The config files are:

  • apt.txt
  • requirements.txt
  • postBuild

Try it here: https://mybinder.org/v2/gh/ouseful-PR/dnanlp/master

Issues:

  • the 02_nlp_openie.ipynb notebook seems to run the first ie.stanford_ie() command okay, but not the second one? I wonder if this is a memory thing?
  • the 03_musk_01_data.ipynb and 04_harvey_01_data.ipynb are missing the DNA API key;
  • in 04_harvey_04_ng.ipynb, code cell 9: FileNotFoundError: File /home/jovyan/data_harvey/results/relations_harvey_250.h5 does not exist
  • the 03_musk_03_oie.ipynb fails at step 18 on Processing 2 of 12:
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<timed exec> in <module>()

/srv/conda/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, mode, **kwargs)
    370             raise compat.FileNotFoundError(
--> 371                 'File %s does not exist' % path_or_buf)
    372 

FileNotFoundError: File /home/jovyan/data_musk/results/relations_musk_100.h5 does not exist

During handling of the above exception, another exception occurred:

AssertionError                            Traceback (most recent call last)
<timed exec> in <module>()

~/modules/soiepy/main.py in stanford_ie(input_filename, verbose, generate_graphviz)
    119         #java_process = Popen(command, stdout=stderr,    # stderr=open(os.devnull, 'w'), shell=True)
    120     java_process.wait()
--> 121     assert not java_process.returncode, 'ERROR: Call to stanford_ie exited with a non-zero code status.'
    122 
    123     with open(out, 'r') as output_file:

I wonder - are the files missing because previous notebook steps did not compute the required results files?

@yhilpisch
Copy link
Owner

yhilpisch commented Dec 5, 2018 via email

@psychemedia
Copy link
Author

psychemedia commented Dec 5, 2018

By the by, with the Binder build elements in place, you can run things from the repo locally under Docker using repo2docker. That said, the resulting Docker image may not be as size optimised as it could be as a result of the automated build process.

Devs were also experimenting with a GPU enabled Binderhub for a demo yesterday, but I suspect that that won't hang around for long as part of the current free (research grant funded) public service.

@psychemedia psychemedia mentioned this issue Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants