5
help clicking "next" using command line in project for archiving post title timelines & redirect links (see details)
Posted March 24, 2025 by Maplefields in Ovarit

disclaimer: I'm a self-taught beginner. I don't know how to write webpage scripts. I only started learning linux bash command line at my own slow pace. I'm applying what I learned so far. I don't know programming languages and I don't have time to learn any, like python and javascript, right now. I am familiar with scripting loops and stuff in BASH.

Update (success!):

Tentative solution:

next=$(grep -E "after=" pg2.txt | cut -d "=" -f 3 | cut -d "\"" -f 1)

I have no problems looping the variable in Git Bash. I'm in the process of downloading all post titles & links data (not the posts themselves) from o/GC in raw txt format.

Currently working on creating a script to extract the data.

Goal:

To go through each GC related CIRCLE page of posts and collect the post titles and URL links (ovarit links and article links) in chronological order.

To get something like this: [date] [title] [ovarit URL link] [article redirect link] [ovarit archive link]

(a list for each circle).

Current problem (update: working new angle):

how do you "click" the next button from command line (currently using git BASH*, and only the curl command works)?

I've gotten as far as downloading the page source using curl.

I'm confident I can extract the data I need given enough time to finetune the grep (or awk) command and the script to output into a neat txt or markdown file. I'll share the extracted text on Ovarit (this may help the archivers organize what has already been archived) and saidit (need to create an account).

$ curl "https://ovarit.com/o/GenderCritical/new?after=Njc1ODE3" --ssl-no-revoke >pg3.txt

Unfortunately, I don't understand the logic behind "after=Njc1ODE3". How does the "next" button decide what random string to assign the next page? I am manually clicking next and copying the url into CLI to figure out a procedure, but I want this process automated. So that the script finds the "next" button and automatically clicks it.

Problem 2:

How do I check from CLI if an ovarit URL has been archived on archive.is/ph/li etc...?

I want to retrieve a true / false response and if possible, an archive link.

I want to post this list so that Ovarit archivers using archive.is know what's already been done.

idea that failed

I thought I could use the firefox extension "pageZipper" that lets you scroll down quickly ( and it clicks "next" for you and appends the next page as you scroll down quickly, so I was able to go through 60 pages in 1 minute of scrolling). But when I clicked page source (and tried to download the htm file), the appended pages didn't appear.

* asterisk note:

(*) Unfortunately, this is a busy month (and April) for me, and I'm stuck using Windows, so I can't boot into Linux for now. I'll need this bash script (once I figure it out) to run in the background of Windows while I do my other work. If this were May, I would have had more time to research how to archive Ovarit on archive.is without manually copying and pasting links. Ideally, we'd have an automated script that splits up the work so each volunteer would just need to run the script once overnight (or several nights) and share the text output with the group.

Why I'm omitting archive.org

It's very easy to send an email request to archive.org to delete archived webpages. I've seen the Kfarm archives be entirely wiped from their database.

About archiving each post

I already read the threads. It's impossible for one IP address to download the entire Ovarit site within a month's time. Volunteers would have to group organize and divide the task among many IP addresses for this to be successful.

Loading comments...