Basic usage

This package provides scraping support for a multitude of light novel sources.

Each scraper has support for scraping novel information and chapter content, while providing optional support for search.

You can check which sources are supported at:

Let’s scrape a novel

Let’s say that we have a url and we have already checked if the source is available.

url = ...

We can begin by importing the module

import novelsave_sources as nss

Now let’s get a crawler instance that works with the url above:

source = nss.locate_novel_source(url)()

In this instance, we arent worried it won’t find a matching crawler.

Now that we have a scraper, lets finally get to scraping the novel.

novel = source.novel(url)

While this gives all the chapters, the objects do not yet have any content in them for that we need to use another method.

for volume in novel.volumes:
    for chapter in volume.chapters:
        source.chapter(chapter)

Keep in mind that scraping chapters can take a lot of time depending on the amount of chapters since it will be downloading each chapter page.

That’s it, it’s that simple.

Find a novel

To get started we need a crawler that supports searching. We could iterate over all the crawlers until we find one that works for us. But for the sake of simplicity let’s import a specific scraper.

from novelsave_sources.sources.novel.novelpub import NovelPub

source = NovelPub()

Great, now we have our scraper. let’s search for umm… “solo”.

novels = source.search('solo')

search returns a list of novel objects with minimal information. You will need to do further scraping to get the chapter list.

Retrieve metadata

Let’s assume you have a url that points toward the correct metadata source.

url = ...

To start, it is similar to scraping a novel. We must first find the correct crawler for the url.

metadata_source = nss.locate_metadata_source(url)()

And then to retrieve all the metadata:

metadata = metadata_source.retrieve(url)

This gives you a list of metadata objects.