WACZ Files! A File Format that can help us archive Ovarit as faithfully as possible with images and even videos!

Posted March 25, 2025 by Femina in Ovarit

Have you ever worked with WACZ Files? If not starting out is very easy! : https://ovarit.com/o/STEM/511014/simple-tutorial-for-people-starting-out-with-making-personal-offline-archives-of

You can use ArchiveExpress to make WACZ Files... : https://express.archiveweb.page

In addition to the method I showed in that tutorial post with ArchiveExpress there are other ways to make WACZ Files!

This is what I have been using to archive my stuff! : https://archiveweb.page/ It is available both as a browser addon and a Desktop App... : https://github.com/webrecorder/archiveweb.page/releases/tag/v0.14.2

Using the WebRecorder ArchiveWeb.page Chrome browser extension addon or Desktop App is very easy but requires going into each page manually which can be VERY tedious to do so I can only use it for my own stuff and for some small circles at most...

There is also a tool called Browsertrix also created by the same folks that made the ArchiveWeb tool that can crawl websites automatically and produce WACZ Files but I don't know how to use it : https://webrecorder.net/browsertrix/ The Browsertrix can be used either paid on the cloud or be deployed locally for free... : https://docs.browsertrix.com/deploy/

If anyone here could make the Browsertrix work it would be an IMMENSE help with the Ovarit archiving efforts!

WACZ Files can archive images and even videos... It can be used to replicate identical copies of websites pretty much...
WACZ Files can be loaded on ReplayWeb either here : https://replayweb.page/ Or Offline with the Desktop App... : https://github.com/webrecorder/replayweb.page/releases/tag/v2.3.4
I made my sample archive with a WACZ File... : https://ovarit.com/o/STEM/678680/small-sample-archive-of-ovarit-i-made
Tutorial to make an archive like my own... : https://ovarit.com/o/STEM/510775/host-your-own-web-archives-on-glitch

My archive is powered by ReplayWeb pretty much basically...

A full archive of Ovarit could be like that sample archive but MUCH bigger and with a more permanent URL with a different name!

Does anybody wanna contribute to this Ovarit WACZ Archiving Project efforts?

There are people interested in hosting an archive of Ovarit already and some have even bought domains all ready for it but we need archive files!

9 comments

Sort by: Best ▾

girl_undone [speaking as mod]

March 25, 2025

- sticky

Wait until after this weekend to worry about mass scraping. We’re going see about making quality flat files and someone promising might be able to host.

direct link
source

OnlyHuman

March 25, 2025

(Edited March 25, 2025)

Thank you!

Are you planning on making the flat files publicly downloadable, or just giving it to those who host?

direct link
source

girl_undone

March 25, 2025

Not sure yet.

direct link
source

Femina [OP]

March 26, 2025

Ok then!

direct link
source

Maplefields

March 25, 2025

(Edited March 25, 2025)

I can’t stay and chat, but I will leave this here:

I might have a lead on automating WACZ collection without triggering Ovarit’s bandwidth restrictions. This weekend, I will test out my plan (below) to see if it’s doable. If I manage to export a WACZ file, I will send it to @femina, if she’s willing, and perhaps she can test if the file is actually useable since she’s familiar with replayweb?

Plan (automation of WACZ file download):

Install archive box.
Write BASH script to export collected URLs (on a timed loop with pauses to not trigger Ovarit’s bandwidth restrictions) through archivebox’s CLI interface set at the lowest settings possible to save hard drive space and computer resources (RAM and CPU).

(Edit: it would really help if I knew what those bandwidth restrictions were. Should I pause every 5 seconds? 30 seconds? If I err on the side of too cautious, we won’t get much archived.)

Install webrecorder/py-wacz
Convert WARC files to WACZ using py-WACZ

If successful, I will share my scripts, and I will make them as easy to use as possible.

About URLs

I’m in the middle of a project that collects all the URLs of each circle in chronological order. Once I have these lists (I will post them), it will be easier to coordinate group archival efforts.

direct link
source

OnlyHuman

March 25, 2025

(Edited March 25, 2025)

I have a script you can DM me for, but I would rather hold off for what girl_undone has to offer because the last thing this website needs is a ton of gremlins running query requests and changing the sleep time to be incredibly low

Would need to trust that you just want it for knowledge, small personal loads, or for repurposing for other websites

Wondering, does browstertrix interact with the site for you? And retain http request information? If it crawls online for you it might not be that big of a deal but if it uses a server close to you and doesn't wipe identifying info, your approximate location will be known- you could probably edit the files to remove it, but this format is honestly a pain

direct link
source

Femina [OP]

March 26, 2025

I'm not too sure about your Browsertrix question because I don't know how to use it but I think it can be used Online on the cloud or deployed locally...

I accept your script! I never used one before but I wanna take a look... I won't use it right now!

direct link
source

being

March 25, 2025

Do you or anyone else know what size these WACZ files typically end up? I know that some posts have a ton more comments than others, or there are image posts, so what range of file size could be expected per file?

direct link
source

Maplefields

March 25, 2025

(Edited March 25, 2025)

WACZ is like a compressed zip file of many individual pages. You can WACZ 1 page or WACZ a group of pages and that will affect how big it is.

direct link
source

Posted March 25, 2025 by Femina

Score: 25

/o/Ovarit

7266 subscribers

Created August 19, 2020

This circle is for "meta" discussions about the Ovarit site or community that don't belong on other meta circles such as:

/o/Circles - propose a new circle
/o/Suggestions - suggest new features for the site
/o/Bugs - for reporting bugs in the website
/o/Announcements - where admins announce changes to the site

If you want to discuss thoughts about how Ovarit should be which aren't suggestions for new features (such as significant changes to site functionality that already exists); or suggest changes to or discuss the Sitewide Rules, Guidelines, Mission, etc; or have discussions about Ovarit or the Ovarit community itself; or ask for help on how to use the site, then this is the place.

No introduction posts.

No external data gathering or external polls.

It is not appropriate to air grievances against individual circles, their moderators, their users in aggregate, or specific Ovarit users here. If you want to contest your interactions with circle moderators (including content removals and bans), message the moderators of the circle. If you can't resolve the issue with the moderators themselves, then message the site admins.

Circles on Ovarit are directed by their specific moderation teams. If you have suggestions for a specific circle or circles, bring them up with the appropriate moderators. Topics are only relevant for this circle if they apply to the entire site.

The Sitewide Rules and Sitewide Guidelines are both enforced here.

Moderators