Skip to content

Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.

License

Notifications You must be signed in to change notification settings

mcjaeger/atarashi

This branch is 63 commits behind fossology/atarashi:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f1a60dc · Aug 14, 2018

History

97 Commits
Aug 14, 2018
Jul 27, 2018
Aug 5, 2018
Aug 13, 2018
May 15, 2018
Aug 7, 2018
Aug 13, 2018
Aug 7, 2018
Aug 7, 2018

Repository files navigation

Atarashi

This is a Google Summer of Code Project.
New License Scanner Project Which Should be Integrated with FOSSology but also Work Independently.

Requirements

  • Python v3.x
  • pip

Steps for Installation

Build (optional)

  • $ python setup.py build
  • Build will generate 3 new files in your current directory
    1. data/Ngram_keywords.json
    2. licenses/<SPDX-version>.csv
    3. licenses/processedList.csv
  • These files will be placed to their appropriate places by the install script.

Install

  • # python setup.py install
  • In install folder, make the "atarashi-install.sh" executable
  • Run <./atarashi-install.sh>
  • pip install -r <pathto/requirements.txt>

How to run

Get the help by running atarashi -h or atarashi --help

Example

  • Running DLD agent

    atarashi -a DLD /path/to/file.c

  • Running wordFrequencySimilarity agent

    atarashi -a wordFrequencySimilarity /path/to/file.c

  • Running tfidf agent

    • With Cosine similarity

      atarashi -a tfidf /path/to/file.c

      atarashi -a tfidf -s CosineSim /path/to/file.c

    • With Score similarity

      atarashi -a tfidf -s ScoreSim /path/to/file.c

  • Running Ngram agent

    • With Cosine similarity

      atarashi -a Ngram /path/to/file.c

      atarashi -a Ngram -s CosineSim /path/to/file.c

    • With Dice similarity

      atarashi -a Ngram -s DiceSim /path/to/file.c

    • With Bigram Cosine similarity

      atarashi -a Ngram -s BigramCosineSim /path/to/file.c

  • Running in verbose mode

    atarashi -a DLD -v /path/to/file.c

  • Running with custom CSVs and JSONs

    • Please reffer to the build instructions to get the CSV and JSON understandable by atarashi.
    • atarashi -a DLD -l /path/to/processedList.csv /path/to/file.c
    • atarashi -a Ngram -l /path/to/processedList.csv -j /path/to/ngram.json /path/to/file.c

Test

  • Run imtihaan (meaning Exam in Hindi) with the name of the Agent.
  • eg. python atarashi/imtihaan.py /path/to/processedList.csv <DLD|tfidf|Ngram> <testfile>
  • See python atarashi/imtihaan.py --help for more

Creating Debian packages

  • Install dependencies
# apt-get install python3-setuptools python3-all debhelper
# pip install stdeb
  • Create Debian packages
$ python3 setup.py --command-packages=stdeb.command bdist_deb
  • Locate the files under deb_dist

About

Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.7%
  • Shell 4.3%