Usage

Installation

You can install the package from the pypi

python3 -m pip install novelsave-sources

or directly from github

python3 -m pip install git+https://github.com/mensch272/novelsave_sources.git

Basic usage

This package provides scraping support for a multitude of light novel sources.

Each scraper has support for scraping novel information and chapter content, while providing optional support for search.

You can check which sources are supported at Support section.

Let’s scrape a novel

Let’s say that we have a url and we have already checked if the source is available.

url = ...

We can begin by importing the module

import novelsave_sources as nss

Now let’s get a crawler instance that works with the url above:

source = nss.locate_novel_source(url)()

In this instance, we arent worried it won’t find a matching crawler.

Now that we have a scraper, lets finally get to scraping the novel.

novel = source.novel(url)

While this gives all the chapters, the objects do not yet have any content in them for that we need to use another method.

for volume in novel.volumes:
    for chapter in volume.chapters:
        source.chapter(chapter)

Keep in mind that scraping chapters can take a lot of time depending on the amount of chapters since it will be downloading each chapter page.

That’s it, it’s that simple.

Find a novel

To get started we need a crawler that supports searching. We could iterate over all the crawlers until we find one that works for us. But for the sake of simplicity let’s import a specific scraper.

from novelsave_sources.sources.novel.novelpub import NovelPub

source = NovelPub()

Great, now we have our scraper. let’s search for umm… “solo”.

novels = source.search('solo')

search returns a list of novel objects with minimal information. You will need to do further scraping to get the chapter list.

Retrieve metadata

Let’s assume you have a url that points toward the correct metadata source.

url = ...

To start, it is similar to scraping a novel. We must first find the correct crawler for the url.

metadata_source = nss.locate_metadata_source(url)()

And then to retrieve all the metadata:

metadata = metadata_source.retrieve(url)

This gives you a list of metadata objects.

Examples

Below are a collection of examples that you may find useful.

Scraping a novel

import novelsave_sources as nss

# The novel url
url = ...

# Get source that can parse the url

try:
    source = nss.locate_novel_source(url)()
except nss.UnknownSourceException:  # source not found
    ...

# Scrape novel information including chapter
# table of contents
novel = source.novel(url)

# Download contents for all the chapters
for volume in novel.volumes:
    for chapter in volume.chapters:
        source.chapter(chapter)

Searching in a specific source

from novelsave_sources.sources.novel.novelpub import NovelPub

# Create the specific source
source = NovelPub()

# Search using the query word: 'solo'
novels = source.search('solo')

Retrieving metadata

import novelsave_sources as nss

# The metadata url
url = ...

# Get metadata source to parse the url
try:
    metadata_source = nss.locate_metadata_source(url)()
except nss.UnknownSourceException:  # source not found
        ...

# Retrieve the metadata
metadata = metadata_source.retrieve(url)

Searching in all supported sources

import novelsave_sources as nss

# Get all source types that can search
sources = [source() for source in nss.novel_source_types() if source.search_viable]

# The search query word
query = 'solo'

# Iterate and collect the novels found
novels = []
for source in sources:
    novels += source.search(query)