Skip to content

vecnet/vecnet-dl

Repository files navigation

Vecnet Metadata Catalog

This application provides the Vecnet Metadata Catalog. It handles the curation and indexing of the data generated by the Vecnet cyberinfrastructure.

Dependencies

  • Fedora Commons 3.6
  • Solr 4.3
  • Redis (version?)
  • Postgresql or other SQL database
  • nginx
  • chruby Ruby version manager

The SETUP file has detailed steps on installing the platform on a bare RHEL machine.

Deployment

First, your public ssh key needs to be put on the server. Ask Don to do this. To deploy to QA:

cap qa deploy

To deploy to Production:

cap production deploy

To deploy from branch

cap <environment> deploy -S branch=<branch name>

To deploy new nginx config. This will reload nginx.

cap <environment> vecnet:update_nginx_config

Other server admin tasks

To rebuild the Fedora object store:

sudo service tomcat6 stop
cd /opt/fedora/server/bin
sudo FEDORA_HOME=/opt/fedora CATALINA_HOME=/usr/share/tomcat6 ./fedora-rebuild.sh
# choose option 1 to rebuild the resource index
sudo FEDORA_HOME=/opt/fedora CATALINA_HOME=/usr/share/tomcat6 ./fedora-rebuild.sh
# choose option 2 to rebuild the SQL database
sudo service tomcat6 start

To resolarize everything...it will take a LONG time to complete.

chruby 2.0.0-p353
RAILS_ENV=qa bundle exec rake solrizer:fedora:solrize_objects

To load and build the MeSH trees run. This will run for a while (~0.5--1 hours)

chruby 2.0.0-p353
RAILS_ENV=qa bundle exec rake vecnet:import:mesh_subjects vecnet:import:eval_mesh_trees

To resolrize with mesh synonyms...it will take a LONG time to complete.

chruby 2.0.0-p353
# This builds the synonyms.txt file if needed.
# you could skip this if synonyms did not change
RAILS_ENV=qa bundle exec rake vecnet:solrize_synonym:get_synonyms FILE=solr_conf/conf/synonyms.txt
#copy this file to solr core
sudo  cp solr_conf/conf/synonyms.txt /opt/solr-4.3.0/vecnet/conf/synonyms.txt
#copy schema and solrconfig
sudo  cp solr_conf/conf/schema.xml /opt/solr-4.3.0/vecnet/conf/schema.xml
sudo  cp solr_conf/conf/solrconfig.xml /opt/solr-4.3.0/vecnet/conf/solrconfig.xml
#change owner to be tomcat
sudo chown tomcat:tomcat -R /opt/solr-4.3.0
#restart solr
sudo service tomcat6 restart
#resolrize all objects
RAILS_ENV=qa bundle exec rake solrizer:fedora:solrize_objects

To ingest Citation to qa/Production #Copy endnote file to file to /opt/endnote and make sure everyone can read sudo cp /from/path/to/endnote/file /opt/endnote sudo chmod -r 755 /opt/endnote #Copy pdf to /opt/citation_file/<createfolder_with_endnote_file_name> and make sure everyone can read sudo cp -r /from/path/to/endnote/pdf/* /opt/citation_file/ sudo chmod -r 755 /opt/citation_file/ #Execute citation task as app user sudo su app cd /home/app/vecnet/current chruby 2.0.0-p353 RAILS_ENV=production bundle exec rake vecnet:import:endnote_conversion ENDNOTE_FILE=/opt/endnote/ ENDNOTE_PDF_PATH=/opt/citation_files:/opt/citation_files/

Initializing new production environment

  1. Do system setup as in SETUP file
  2. Get capistrano deploy working to new site
  3. on production machine:
  • setup ruby: chruby 2.0.0-p353
  • Setup mesh terms: RAILS_ENV=production bundle exec rake vecnet:import:mesh_subjects vecnet:import:eval_mesh_trees
  • Migrate user table: See below
  • Resolrize: RAILS_ENV=production bundle exec rake solrizer:fedora:solrize_objects
  • Migrate fedora objects: RAILS_ENV=production bundle exec rake vecnet:migrate:batch_to_collection
  1. Done!

NCBI Terminalogy

Work in progress. After running rake db:migrate the following task will download the NCBI taxonomy from the following location

ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

and ingest the terms into the database.

rake vecnet:import:ncbi_taxonomy

There are about 1,091,096 terms (November 2013).

Gather repository contents for statistics

OUTFILE=~/repo-stats-20130916.csv RAILS_ENV=production bundle exec rake vecnet:dump_statistics

Pubtkt Authentication

The site uses the pubtkt authentication scheme, which uses a signed cookie for every request. For development, a dummy login to create a pubtkt is provided (class DevelopmentSessions). But, first, a public/private key pair needs to be generated and installed.

rake pubtkt:generate_keys
mv pubtkt.pem config/pubtkt-development.pem
mv pubtkt-private.pem config/pubtkt-private-development.pem

And that should be enough for development. There are also utility rake tasks for creating and verifying tickets:

  1. To create a ticket on the comand line:

    $ P_KEY=pubtkt-private.pem P_UID=dbrower P_VALIDUNTIL=3456789012 P_TOKENS='dl_librarian,dl_write' rake pubtkt:create uid=dbrower;validuntil=3456789012;tokens=dl_librarian,dl_write;sig=MCwCFHiaErA+7lHoHxbSUIZaSnmTovIPAhRf4RxtrmArBMD8CBnZaUM/yWI+Cw==

The valid until date above has the date July 16, 2079 in the Unix epoch, so the ticket should not expire while you are using it. 2. To validate tickets from the command line:

$ P_KEY=pubtkt-private.pem P_TICKET='uid=dbrower;validuntil=3456789012;tokens=dl_librarian,dl_write;sig=MCwCFF1/aaSbtrxN9PLrZE1XvLH5SIWQAhRXN8AHevzPMFbMuIIlOwuCLTZDPw==' rake pubtkt:verify
Ticket text: uid=dbrower;validuntil=3456789012;tokens=dl_librarian,dl_write
Ticket sig : MCwCFF1/aaSbtrxN9PLrZE1XvLH5SIWQAhRXN8AHevzPMFbMuIIlOwuCLTZDPw==
Sig Valid? : true
Expired?   : true