ID: e92f35a9f5845e6eed758562adbae8c66f407999
33 lines
—
1K —
View raw
| # Extract plaintext from library PDF files. These texts are used in the library pages.
cp -R pdf pdf_to_text
cd pdf_to_text
for file in *.pdf; do pdftotext -layout "$file"; done
rm -f *.pdf
# Database
The tools used here are available for download at
<https://jena.apache.org/download/#apache-jena-binary-distributions>. Just download
"apache-jena-<version>.tar.gz" and "apache-jena-fuseki-<version>.tar.gz"
Databases can be created with:
tdb2.tdbloader --loc=database_name *.ttl
Place all the databases into the folder "fuseki_base/databases/".
Now to start the server, just run the systemd service file "fuseki.service" available
in this repository. Don't forget to edit both "fuseki.service" and "fuseki_base/configuration/dokk.ttl"
for configuring paths or other custom settings.
# Run website:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
gunicorn --reload --worker-connections=4 --threads=4 --bind 0.0.0.0:8080 --error-logfile=- app:application
# Crawl website for static HTML pages:
wget2 --mirror --max-threads=16 --page-requisites --adjust-extension --execute robots=off localhost:8080
|