Skip to content

issues Search Results · repo:decrypto-org/spider language:TeX

Filter by

29 results
 (64 ms)

29 results

indecrypto-org/spider (press backspace or delete to remove)

New Table: Crawls: Id human readable id on Paths: Secure flag could evolve over time, could be shifted to the content, meaning: content was found with secure flag true/false Contents: Add - Crawl ...
  • jogli5er
  • Opened 
    on Dec 5, 2019
  • #38

For every new discovered host, check for a robots.txt. Then, for every URL, we need to check whether it is allowed to access or not depending on the robots.txt. This can be either done during insertion ...
todo
  • jogli5er
  • Opened 
    on Nov 21, 2019
  • #37

1. The compilation does not insert the references correctly 2. Few (ignorable) errors in the compilation with pdflatex
documentation
  • jogli5er
  • 1
  • Opened 
    on Sep 24, 2018
  • #27

Currently, the table links defines a many-to-many relation between paths. This is incorrect. The links table must define a many-to-many relation between a content (which has an outgoing link) and path ...
  • dionyziz
  • 1
  • Opened 
    on Aug 29, 2018
  • #26

We found examples in which our parser fails to produce a correct onion uri. (See example below) To fix the issue, I will implement a preciser Regex (as discussed on slack): Make it exactly 16 or 56 characters ...
  • jogli5er
  • Opened 
    on Jun 7, 2018
  • #25

The latest counts showed that we filter out a large part of the contents, either because they have a wrong mimetype (we should not even download those, see #23 ) or because the parser finds something that ...
bug
  • jogli5er
  • 2
  • Opened 
    on May 31, 2018
  • #24

We found that we hit a lot of image urls, which we won t store anyway, therefore we should filter by file type
enhancement
  • jogli5er
  • Opened 
    on May 31, 2018
  • #23

Features: binary: Set of words (vectorized) binary + weighting: binary vector multiplied with weights frequency: Bag of words (vectorized) frequency + weight: some function, e.g. log_2(freq_in_body) + ...
enhancement
  • jogli5er
  • Opened 
    on May 31, 2018
  • #22

The current solution has the following issue: We stored the subdomain as a denormalized column directly inside the baseUrls table. This, however, leads to the issue, that we now have multiple entries per ...
bug
  • jogli5er
  • 1
  • Opened 
    on May 31, 2018
  • #21

Currently, we have an issue with pages that have a considerable amount of links to themselves, as websites that list all bitcoin transactions, blocks or websites that host an extensive library and let ...
enhancement
  • jogli5er
  • Opened 
    on May 24, 2018
  • #20
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue search results · GitHub