Tree: zplus/dokk.org.git/939a24d2deb4f6e4958858f0999a0985d673e72e/README

ID: 48769d6b9ddb9122cef25d58cb039812bf5cd9bd
33 lines — 1K — View raw

# Extract plaintext from library PDF files. These texts are used in the library pages.

    cp -R pdf pdf_to_text
    cd pdf_to_text
    for file in *.pdf; do pdftotext -layout "$file"; done
    rm -f *.pdf

# Database

The tools used here are available for download at
<https://jena.apache.org/download/#apache-jena-binary-distributions>. Just download
"apache-jena-<version>.tar.gz" and "apache-jena-fuseki-<version>.tar.gz"

Databases can be created with:

    # Increase Java heap size
    # If there's not enough RAM, try with a different loader, such as --loader=basic
    export JVM_ARGS=-Xmx16G
    tdb2.tdbloader --loc=database_name *.nt

Place all the databases into the folder "fuseki_base/databases/".
Now to start the server, just run the systemd service file "fuseki.service" available
in this repository. Don't forget to edit both "fuseki.service" and "fuseki_base/configuration/dokk.ttl"
for configuring paths or other custom settings.

# Run website:

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    # Set the Fuseki endpoint URI that the app will use for querying
    export FUSEKI_ENDPOINT="https://example.org:3030/dokk"
    gunicorn --reload --worker-connections=4 --threads=4 --bind 0.0.0.0:8080 --error-logfile=- app:application