-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contents of type "Inhaltsseite" won't get crawled #50
Comments
I'd guess the latter, could you pass the |
Here is the output with the
|
Yea, so it apparently did not recognize anything useful. I will have a look at it, but not before the ILIAS 7 migration in a few days if that's alright with you. That one will probably absolutely slaughter the HTML parser anyways :P |
Could you have a look at what https://github.com/Garmelon/PFERD/releases/tag/v3.3.0 produces @Geronymos? |
Even though pferd 3.3 can download all regular content again (thank you for that!), it unfortunately still downloads nothing for those types of links. But it recognizes that it is of type content page (see explain log). As I see it "Inhaltsseite" might be an option for the lecturer to write pure html. So maybe it could be handled like a "external link": downloaded as plaintext and download links within the page. explain-log
|
The "content page" has a "file" feature which I added support for. I thought they were nice enough to use it but they are not... I don't really want to crawl random pages linked by the content page - that could lead to weird network requests, errors when the remote file is behind authentication and so on. I was about to suggest writing a dedicated crawler type for the math page but they don't even link them there... So I guess I will have to find a compromise here.
All of these will lead to errors if there are links to files behind authentication. |
My analysis course uses the structure of "Inhaltsseite" (icon looks like a laptop showing a diagram) to provide the script (which gets updated regularly) as well as the exercise sheets and its solutions.
Unfortunately I can't download them with
pferd
. I tried using the command line, the config file downloading the whole course and explicit URL but nothing is working.When executing
pferd kit-ilias-web [url] .
it just saysAnd the folder stays empty.
Is this a misconfiguration on my end or is this type of structure not implemented yet?
The text was updated successfully, but these errors were encountered: