Basic usage
This package provides scraping support for a multitude of light novel sources.
Each scraper has support for scraping novel information and chapter content, while providing optional support for search.
You can check which sources are supported at:
Let’s scrape a novel
Let’s say that we have a url and we have already checked if the source is available.
url = ...
We can begin by importing the module
import novelsave_sources as nss
Now let’s get a crawler instance that works with the url above:
source = nss.locate_novel_source(url)()
In this instance, we arent worried it won’t find a matching crawler.
Now that we have a scraper, lets finally get to scraping the novel.
novel = source.novel(url)
While this gives all the chapters, the objects do not yet have any content in them for that we need to use another method.
for volume in novel.volumes:
for chapter in volume.chapters:
source.chapter(chapter)
Keep in mind that scraping chapters can take a lot of time depending on the amount of chapters since it will be downloading each chapter page.
That’s it, it’s that simple.
Find a novel
To get started we need a crawler that supports searching. We could iterate over all the crawlers until we find one that works for us. But for the sake of simplicity let’s import a specific scraper.
from novelsave_sources.sources.novel.novelpub import NovelPub
source = NovelPub()
Great, now we have our scraper. let’s search for umm… “solo”.
novels = source.search('solo')
search returns a list of novel
objects with minimal information. You will need to do further scraping to
get the chapter list.
Retrieve metadata
Let’s assume you have a url that points toward the correct metadata source.
url = ...
To start, it is similar to scraping a novel. We must first find the correct crawler for the url.
metadata_source = nss.locate_metadata_source(url)()
And then to retrieve all the metadata:
metadata = metadata_source.retrieve(url)
This gives you a list of metadata objects.