-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running on MyBinder #1
Comments
Hi Tony,
Thanks for your suggestions. Would indeed be a good idea.
However, the module that is missing is not a Python package (only), it is a
Java package that also requires the Java runtime environment to be
installed.
This is the major reason why I have provided the Bash script
`setup_dna_nlp.sh` that install everything required -- not only the Python
packages.
Halve of the Jupyter Notebooks rely on this Java package.
Best,
Yves
…On Wed, Dec 5, 2018 at 4:13 PM Tony Hirst ***@***.***> wrote:
To simplify the process of allowing folk to run the demo notebooks, you
could set the repo up so that it runs on MyBinder.
For example, at the top level of the repo, create a requirements.py
containing:
tables
pandas
matplotlib
scikit-learn
nltk==3.2.5
gensim
networkx
lxml
cufflinks
wordcloud
pyvis
and a postBuild file containining:
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt')"
python -c "import nltk; nltk.download('vader_lexicon'); nltk.download('wordnet')"
(I may have missed some other dependencies.)
Then you should be able to launch a container and make a start running the
notebooks from MyBinder:
https://mybinder.org/v2/gh/yhilpisch/dnanlp/master
In a quick test, I noticed that the notebook 02_nlp/02_nlp_openie.ipynb
has an import that I've missed the dependency for?
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-16-a01c2d57ef10> in <module>()
2 sys.path.append('../../../')
3 sys.path.append('../../modules/')
----> 4 import soiepy.main as ie
ModuleNotFoundError: No module named 'soiepy'
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEXFgDEkicoY1tns7I1WyTj2fTMKevJyks5u1-J6gaJpZM4ZC7h2>
.
--
Dr Yves J Hilpisch
The Python Quants GmbH
+49 3212 1129194
http://aimachine.io
http://certificate.tpq.io
http://tpq.io | http://pqp.io
|
Java can be installed as well... e.g. create an
(or whatever version) to have it installed. Any explicit Linux command line steps you need to run to get / install unpackaged items can be added to the |
I made a start here: https://github.com/ouseful-PR/dnanlp The config files are:
Try it here: https://mybinder.org/v2/gh/ouseful-PR/dnanlp/master Issues:
I wonder - are the files missing because previous notebook steps did not compute the required results files? |
Thanks for the initiative.
First of all, I started all this yesterday (before going on a short
business trip) and it is not yet finished.
Some notebooks require a DNA API key to retrieve data which is only
available via a paid subscription.
Indeed, some notebooks require results from other ones and fail if the
respective files are not yet created.
So far I tested the codes on a DigitalOcean droplet with 4 cores and 8GB
RAM plus 10GB of swap. It all worked. However, some commands need 20+
minutes to execute.
The failing of the Java package execution most probably is a memory issue
(see above spec -- 8GB were not enough for some commands).
I will work further on the repo. It is just a start and work in progress.
…On Wed, Dec 5, 2018 at 6:28 PM Tony Hirst ***@***.***> wrote:
I made a start here: https://github.com/ouseful-PR/dnanlp
The config files are:
- apt.txt
- requirements.txt
- postBuild
Try it here: https://mybinder.org/v2/gh/ouseful-PR/dnanlp/master
Issues:
- the 02_nlp_openie.ipynb notebook seems to run the first
ie.stanford_ie() command okay, but not the second one? I wonder if
this is a memory thing?
- the 03_musk_01_data.ipynb and 04_harvey_01_data.ipynb are missing
the *DNA API key*;
- in 04_harvey_04_ng.ipynb, code cell 9: FileNotFoundError: File
/home/jovyan/data_harvey/results/relations_harvey_250.h5 does not exist
- the 03_musk_03_oie.ipynb fails at step 18 on *Processing 2 of 12*:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<timed exec> in <module>()
/srv/conda/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, mode, **kwargs)
370 raise compat.FileNotFoundError(
--> 371 'File %s does not exist' % path_or_buf)
372
FileNotFoundError: File /home/jovyan/data_musk/results/relations_musk_100.h5 does not exist
During handling of the above exception, another exception occurred:
AssertionError Traceback (most recent call last)
<timed exec> in <module>()
~/modules/soiepy/main.py in stanford_ie(input_filename, verbose, generate_graphviz)
119 #java_process = Popen(command, stdout=stderr, # stderr=open(os.devnull, 'w'), shell=True)
120 java_process.wait()
--> 121 assert not java_process.returncode, 'ERROR: Call to stanford_ie exited with a non-zero code status.'
122
123 with open(out, 'r') as output_file:
I wonder - are the files missing because previous notebook steps did not
compute the required results files?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEXFgDU4O2Amnag3QpYwJcZ5EWNC6CEQks5u2AJOgaJpZM4ZC7h2>
.
--
Dr Yves J Hilpisch
The Python Quants GmbH
+49 3212 1129194
http://aimachine.io
http://certificate.tpq.io
http://tpq.io | http://pqp.io
|
By the by, with the Binder build elements in place, you can run things from the repo locally under Docker using Devs were also experimenting with a GPU enabled Binderhub for a demo yesterday, but I suspect that that won't hang around for long as part of the current free (research grant funded) public service. |
To simplify the process of allowing folk to run the demo notebooks, you could set the repo up so that it runs on MyBinder.
For example, at the top level of the repo, create a
requirements.txt
file containing:and a
postBuild
file containining:(I may have missed some other dependencies.)
Then you should be able to launch a container and make a start running the notebooks from MyBinder: https://mybinder.org/v2/gh/yhilpisch/dnanlp/master
In a quick test, I noticed that the notebook
02_nlp/02_nlp_openie.ipynb
has an import that I've missed the dependency for?The text was updated successfully, but these errors were encountered: