Scrapy is controlled through the scrapy-ctl.py control script. The script provides several commands, for different purposes. Each command supports its own particular syntax. In other words, each command supports a different set of arguments and options.
This page doesn’t describe each command and its syntax, but provides an introduction to how the scrapy-ctl.py script is used. After you learn how to use it, you can get help for each particular command using the same scrapy-ctl.py script.
There is one global scrapy-ctl.py script shipped with Scrapy and another scrapy-ctl.py script automatically created inside your Scrapy project. The project-specific scrapy-ctl.py is just a thin wrapper around the global scrapy-ctl.py which populates the settings of your project, so you don’t have to specify them every time through the --settings argument.
The first thing you would do with the scrapy-ctl.py script is create your Scrapy project:
scrapy-ctl.py startproject myproject
That will create a Scrapy project under the myproject directory and will put a new scrapy-ctl.py inside that directory.
So, you go inside the new project directory:
cd myproject
And you’re ready to use your project’s scrapy-ctl.py. For example, to create a new spider:
python scrapy-ctl.py genspider mydomain mydomain.com
This is the same as using the global scrapy-ctl.py script and passing the project settings module in the --settings argument:
scrapy-ctl.py --settings=myproject.settings genspider mydomain mydomain.com
You’ll typically use the project-specific scrapy-ctl.py, for convenience.
To see all available commands type:
scrapy-ctl.py -h
That will print a summary of all available Scrapy commands.
The first line will print the currently active project (if any).
Example (active project):
Scrapy X.X.X - project: myproject
Usage
=====
...
Example (no active project):
Scrapy X.X.X - no active project
Usage
=====
...
To get help about a particular command, including its description, usage and available options type:
scrapy-ctl.py <command> -h
Example:
scrapy-ctl.py crawl -h
Not all commands must be run from “inside” a Scrapy project. You can, for example, use the fetch command to download a page (using Scrapy built-in downloader) from outside a project. Other commands that can be used outside a project are startproject (obviously) and shell, to launch a Scrapy Shell.