Skip to content

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

License

Notifications You must be signed in to change notification settings

piopi/BehatCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BehatCrawler

PHP Composer

The BehatCrawler is a Behat, MinkExtension and Selenium2Driver extension that crawls a given URL and executes user-defined functions in each crawled page.

Multiple options for crawling are available, see available options.

Installation

composer require piopi/behatcrawler

Usage

Start by importing the extension, to your Feature Context (or any of your Context):

use Behat\Crawler\Crawler;

Create your Crawler object with the default configuration:

The crawler is only compatible at this time with Selenium2Driver

//$crawler=New Crawler(BehatSession);
$crawler= New Crawler($this->getSession());

For custom settings (passed as an array), see the following table for all the available options.

$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20]);

Available options: (More functionalities coming soon)

Option Description Default Value
Depth Maximum depth that can be crawled from URL 0 (unlimited)
MaxCrawl Maximum number of crawls 0 (unlimited)
HTMLOnly Will only crawl HTML/xHTML pages true
internalLinksOnly Will crawl internal links only (links with same Domaine name as the initial URL) true
waitForCrawl Will wait for the crawler to finish crawling before throwing any exception originating from the user defined functions. (Compile a list of all exceptions found with their respective location) false

Option can either be set in the constructor or with the appropriate getters/setters:

 $crawler= New Crawler($this->getSession(),["MaxCrawl"=>10]);
//or
$crawler->setMaximumCrawl(10);

Start Crawling

After creating and setting up the crawler, you can start crawling by passing your function as an argument:

Please refer to the PHP Callables documentation for more details.

Examples:

Closure::fromCallable is used to pass by parameter private function

//function 1 is a private function
$crawler->startCrawling(Closure::fromCallable([$this, 'function1']));
//function 2 is a public class function
$crawler->startCrawling([$this, 'function1']);

For functions with one or more arguments, they can be passed as the following:

$crawler->startCrawling(Closure::fromCallable([$this, 'function3']),[arg1]);
$crawler->startCrawling(Closure::fromCallable([$this, 'function4']),[arg1,arg2]);

Usage Example

use Behat\Crawler\Crawler;
//Crawler with different settings
$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20,"waitForCrawl"=>true]);
//Function without arguments
$crawler->startCrawling(Closure::fromCallable([$this, 'function1'])); //Will start crawling
//Function with one or more argument
$crawler->startCrawling(Closure::fromCallable([$this, 'function2']),[arg1,arg2]);

In a Behat step function:

   /**
     * @Given /^I crawl the website with a maximum of (\d+) level$/
     */
    public function iCrawlTheWebsiteWithAMaximumOfLevel($arg1)
    {
        $crawler= New Crawler($this->getSession(),["Depth"=>$arg1]);
        $crawler->startCrawling([$this, 'test']);
    }

Copyright

Copyright (c) 2020 Mostapha El Sabah [email protected]

Maintainers

Mostapha El Sabah Piopi

About

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages