Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate downloading checking + fix startFromOffset #183

Merged
merged 4 commits into from
Feb 22, 2025

Conversation

treetrum
Copy link
Owner

@treetrum treetrum commented Feb 21, 2025

Changes:

  • Adds a simple check to see if a book has already been downloaded (by checking the local files existence + filesize) and if its already downloaded will abort the request to save time on reruns of the tool
  • Fixes the behaviour of startFromOffset by offsetting the list of books we fetch instead of offsetting the individual book downloads
  • Adds a new flag --duplicateHandling with options for either skip or overwrite to control the duplicate download check
  • Updates the default for --totalDownloads to be Infinity
  • Downloaded file names now include the unique amazon indentifier (ASIN) to hopefully prevent issues where different editions of the same book were being overwritten when they shared a file name. Should address Duplicate Books Handling #185 and Missing failed download message? Counts don't add up. #186

@treetrum treetrum mentioned this pull request Feb 21, 2025
@treetrum treetrum linked an issue Feb 21, 2025 that may be closed by this pull request
@treetrum treetrum force-pushed the feature/skip-already-downloaded-books branch from 97ba4b3 to e16378c Compare February 21, 2025 11:05
@treetrum treetrum changed the title WIP Don't redownload already downloaded books WIP Don't redownload already downloaded books + fix startFromOffset Feb 21, 2025
@treetrum treetrum changed the title WIP Don't redownload already downloaded books + fix startFromOffset Duplicate downloading checking + fix startFromOffset Feb 21, 2025
@HamsterExAstris
Copy link

Good news: the duplicate downloading checking appears to be working great.

Bad news: The startFromOffset changes appear to have broken something else. Where previously it found all ~2200-ish books in my account with no pagination parameters (just --baseUrl), it's now only finding 200 and then stopping.

| Found 200 books in total

@jsonbecker
Copy link

Can confirm the same-- the skipping of existing downloads work, the offset fix does not. Checking out e16378c alone was a huge improvement, so I wonder if it's worth reverting the offset fix into it's own PR.

@treetrum
Copy link
Owner Author

treetrum commented Feb 21, 2025

Thanks for the reports! I think I will split off the offset fix into its own PR then.

@treetrum
Copy link
Owner Author

I think I found the culprit of only downloading 200 books. Was a simple fix so will keep in this PR for now. @jsonbecker / @HamsterExAstris I would really appreciate if either of you could pull the latest and retest before I merge this?

@jsonbecker
Copy link

I didn't actually redownload, but it does appear to find the full count of books now and (and didn't download them because I have them).

@treetrum treetrum linked an issue Feb 21, 2025 that may be closed by this pull request
@treetrum treetrum linked an issue Feb 21, 2025 that may be closed by this pull request
@treetrum treetrum marked this pull request as ready for review February 21, 2025 22:14
@treetrum treetrum force-pushed the feature/skip-already-downloaded-books branch from 631a400 to bb32cd4 Compare February 21, 2025 22:21
@HamsterExAstris
Copy link

Can confirm the current version in the branch fixes the 200 book limit.

It looks like the duplicate handling is working too. I didn't think it was at first, but that's because the file names changed to append ASIN. I think that's a good idea because it fixes other issues reported when a user has multiple books with the same title; but it might cause some confusion over why duplicates aren't being skipped.

@warriordog
Copy link

The latest version works for me. Was able to download 3800 books with only 63 failures.

@treetrum treetrum merged commit c26aeef into main Feb 22, 2025
1 check passed
@treetrum treetrum deleted the feature/skip-already-downloaded-books branch February 22, 2025 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing failed download message? Counts don't add up. Duplicate Books Handling RFE - caching downloads
4 participants