issues Search Results · repo:decrypto-org/spider language:TeX
Filter by
29 results
(101 ms)29 results
indecrypto-org/spider (press backspace or delete to remove)New Table: Crawls: Id human readable id
on Paths: Secure flag could evolve over time, could be shifted to the content, meaning: content was found with secure
flag true/false
Contents: Add
- Crawl ...
jogli5er
- Opened on Dec 5, 2019
- #38
For every new discovered host, check for a robots.txt. Then, for every URL, we need to check whether it is allowed to
access or not depending on the robots.txt. This can be either done during insertion ...
todo
jogli5er
- Opened on Nov 21, 2019
- #37
1. The compilation does not insert the references correctly
2. Few (ignorable) errors in the compilation with pdflatex
documentation
jogli5er
- 1
- Opened on Sep 24, 2018
- #27
Currently, the table links defines a many-to-many relation between paths. This is incorrect. The links table must define
a many-to-many relation between a content (which has an outgoing link) and path ...
dionyziz
- 1
- Opened on Aug 29, 2018
- #26
We found examples in which our parser fails to produce a correct onion uri. (See example below) To fix the issue, I will
implement a preciser Regex (as discussed on slack): Make it exactly 16 or 56 characters ...
jogli5er
- Opened on Jun 7, 2018
- #25
The latest counts showed that we filter out a large part of the contents, either because they have a wrong mimetype (we
should not even download those, see #23 ) or because the parser finds something that ...
bug
jogli5er
- 2
- Opened on May 31, 2018
- #24
We found that we hit a lot of image urls, which we won t store anyway, therefore we should filter by file type
enhancement
jogli5er
- Opened on May 31, 2018
- #23
Features: binary: Set of words (vectorized) binary + weighting: binary vector multiplied with weights frequency: Bag of
words (vectorized) frequency + weight: some function, e.g. log_2(freq_in_body) + ...
enhancement
jogli5er
- Opened on May 31, 2018
- #22
The current solution has the following issue: We stored the subdomain as a denormalized column directly inside the
baseUrls table. This, however, leads to the issue, that we now have multiple entries per ...
bug
jogli5er
- 1
- Opened on May 31, 2018
- #21
Currently, we have an issue with pages that have a considerable amount of links to themselves, as websites that list all
bitcoin transactions, blocks or websites that host an extensive library and let ...
enhancement
jogli5er
- Opened on May 24, 2018
- #20

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.