Duplicate downloading checking + fix startFromOffset #183

treetrum · 2025-02-21T08:06:48Z

Changes:

Adds a simple check to see if a book has already been downloaded (by checking the local files existence + filesize) and if its already downloaded will abort the request to save time on reruns of the tool
Fixes the behaviour of startFromOffset by offsetting the list of books we fetch instead of offsetting the individual book downloads
Adds a new flag --duplicateHandling with options for either skip or overwrite to control the duplicate download check
Updates the default for --totalDownloads to be Infinity
Downloaded file names now include the unique amazon indentifier (ASIN) to hopefully prevent issues where different editions of the same book were being overwritten when they shared a file name. Should address Duplicate Books Handling #185 and Missing failed download message? Counts don't add up. #186

HamsterExAstris · 2025-02-21T16:44:10Z

Good news: the duplicate downloading checking appears to be working great.

Bad news: The startFromOffset changes appear to have broken something else. Where previously it found all ~2200-ish books in my account with no pagination parameters (just --baseUrl), it's now only finding 200 and then stopping.

| Found 200 books in total

jsonbecker · 2025-02-21T19:39:56Z

Can confirm the same-- the skipping of existing downloads work, the offset fix does not. Checking out e16378c alone was a huge improvement, so I wonder if it's worth reverting the offset fix into it's own PR.

treetrum · 2025-02-21T20:07:04Z

Thanks for the reports! I ~~think I will split off the offset fix into its own PR then.~~

src/index.ts

treetrum · 2025-02-21T21:18:46Z

I think I found the culprit of only downloading 200 books. Was a simple fix so will keep in this PR for now. @jsonbecker / @HamsterExAstris I would really appreciate if either of you could pull the latest and retest before I merge this?

jsonbecker · 2025-02-21T21:26:11Z

I didn't actually redownload, but it does appear to find the full count of books now and (and didn't download them because I have them).

…e filesystem

…TEGER

…queness across editions

HamsterExAstris · 2025-02-21T22:37:32Z

Can confirm the current version in the branch fixes the 200 book limit.

It looks like the duplicate handling is working too. I didn't think it was at first, but that's because the file names changed to append ASIN. I think that's a good idea because it fixes other issues reported when a user has multiple books with the same title; but it might cause some confusion over why duplicates aren't being skipped.

warriordog · 2025-02-21T23:39:02Z

The latest version works for me. Was able to download 3800 books with only 63 failures.

treetrum mentioned this pull request Feb 21, 2025

RFE - caching downloads #162

Closed

treetrum linked an issue Feb 21, 2025 that may be closed by this pull request

RFE - caching downloads #162

Closed

treetrum force-pushed the feature/skip-already-downloaded-books branch from 97ba4b3 to e16378c Compare February 21, 2025 11:05

treetrum changed the title ~~WIP Don't redownload already downloaded books~~ WIP Don't redownload already downloaded books + fix startFromOffset Feb 21, 2025

treetrum changed the title ~~WIP Don't redownload already downloaded books + fix startFromOffset~~ Duplicate downloading checking + fix startFromOffset Feb 21, 2025

treetrum mentioned this pull request Feb 21, 2025

Downloads Maxes out at 10024 books in total #181

Open

treetrum commented Feb 21, 2025

View reviewed changes

src/index.ts Outdated Show resolved Hide resolved

treetrum linked an issue Feb 21, 2025 that may be closed by this pull request

Duplicate Books Handling #185

Closed

This was referenced Feb 21, 2025

Duplicate Books Handling #185

Closed

Missing failed download message? Counts don't add up. #186

Closed

treetrum linked an issue Feb 21, 2025 that may be closed by this pull request

Missing failed download message? Counts don't add up. #186

Closed

treetrum mentioned this pull request Feb 21, 2025

Duplicate download handling using HEAD request #187

Closed

treetrum marked this pull request as ready for review February 21, 2025 22:14

treetrum added 4 commits February 22, 2025 09:21

WIP Don't redownload books that are already on the correct size on th…

7043886

…e filesystem

Fix startFromOffset, update set default totalDownloads to MAX_SAFE_IN…

f076a7b

…TEGER

Fix getAllContentItems pagination logic

e6ffbd5

Add --duplicateHandling flag, add ASIN to download file to ensure uni…

bb32cd4

…queness across editions

treetrum force-pushed the feature/skip-already-downloaded-books branch from 631a400 to bb32cd4 Compare February 21, 2025 22:21

treetrum merged commit c26aeef into main Feb 22, 2025
1 check passed

treetrum deleted the feature/skip-already-downloaded-books branch February 22, 2025 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate downloading checking + fix startFromOffset #183

Duplicate downloading checking + fix startFromOffset #183

treetrum commented Feb 21, 2025 •

edited

Loading

HamsterExAstris commented Feb 21, 2025

jsonbecker commented Feb 21, 2025

treetrum commented Feb 21, 2025 •

edited

Loading

treetrum commented Feb 21, 2025

jsonbecker commented Feb 21, 2025

HamsterExAstris commented Feb 21, 2025

warriordog commented Feb 21, 2025

Duplicate downloading checking + fix startFromOffset #183

Duplicate downloading checking + fix startFromOffset #183

Conversation

treetrum commented Feb 21, 2025 • edited Loading

Changes:

HamsterExAstris commented Feb 21, 2025

jsonbecker commented Feb 21, 2025

treetrum commented Feb 21, 2025 • edited Loading

treetrum commented Feb 21, 2025

jsonbecker commented Feb 21, 2025

HamsterExAstris commented Feb 21, 2025

warriordog commented Feb 21, 2025

treetrum commented Feb 21, 2025 •

edited

Loading

treetrum commented Feb 21, 2025 •

edited

Loading