Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Mongo: allowDiskUse:true to opt in #54

Open
rexdouglass opened this issue Jan 18, 2019 · 2 comments
Open

Error Mongo: allowDiskUse:true to opt in #54

rexdouglass opened this issue Jan 18, 2019 · 2 comments

Comments

@rexdouglass
Copy link

rexdouglass commented Jan 18, 2019

~/duplicate-images$ python3 duplicate_finder.py find
Started database...
Traceback (most recent call last):
File "duplicate_finder.py", line 358, in
dups = find(db, args['--match-time'])
File "duplicate_finder.py", line 238, in find
"total": {"$gt": 1}
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 2397, in aggregate
**kwargs)
File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 2304, in _aggregate
client=self.__database.client)
File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 579, in command
unacknowledged=unacknowledged)
File "/usr/local/lib/python3.5/dist-packages/pymongo/network.py", line 150, in command
parse_write_concern_error=parse_write_concern_error)
File "/usr/local/lib/python3.5/dist-packages/pymongo/helpers.py", line 155, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.

@senmaoy
Copy link

senmaoy commented Dec 8, 2020

I have the same problem when processing a 50G dataset!

@7633
Copy link

7633 commented Jan 26, 2021

I've just add allowDiskUse=True to db.aggregate method:

def find(db, match_time=False):
    dups = db.aggregate([{
        "$group": {
            "_id": "$hash",
            "total": {"$sum": 1},
            "items": {
                "$push": {
                    "file_name": "$_id",
                    "file_size": "$file_size",
                    "image_size": "$image_size",
                    "capture_time": "$capture_time"
                }
            }
        }
    },
    {
        "$match": {
            "total": {"$gt": 1}
        }
    }], allowDiskUse=True)

    if match_time:
        dups = (d for d in dups if same_time(d))

    return list(dups)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants