Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no support for filenames containing unicode (Hebrew in my case) #38

Open
eyal-rounds opened this issue May 25, 2018 · 0 comments
Open

Comments

@eyal-rounds
Copy link

Traceback (most recent call last):
  File "duplicate_finder.py", line 358, in <module>
    add(args['<path>'], db)
  File "duplicate_finder.py", line 194, in add
    for result in hash_files_parallel(files):
  File "duplicate_finder.py", line 159, in hash_files_parallel
    for result in executor.map(hash_file, files):
  File "/usr/lib/python3.5/concurrent/futures/process.py", line 484, in map
    timeout=timeout)
  File "/usr/lib/python3.5/concurrent/futures/_base.py", line 548, in map
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/lib/python3.5/concurrent/futures/_base.py", line 548, in <listcomp>
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/lib/python3.5/concurrent/futures/process.py", line 139, in _get_chunks
    chunk = tuple(itertools.islice(it, chunksize))
  File "duplicate_finder.py", line 182, in new_image_files
    if _in_database(file, db):
  File "duplicate_finder.py", line 177, in _in_database
    return db.count({"_id": file}) > 0
  File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 1560, in count
    return self._count(cmd, collation, session)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 1511, in _count
    session=session)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/collection.py", line 245, in _command
    retryable_write=retryable_write)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 522, in command
    self._raise_connection_failure(error)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 677, in _raise_connection_failure
    raise error
  File "/usr/local/lib/python3.5/dist-packages/pymongo/pool.py", line 517, in command
    collation=collation)
  File "/usr/local/lib/python3.5/dist-packages/pymongo/network.py", line 103, in command
    None, codec_options, check_keys)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd7' in position 49: surrogates not allowed

making some printing seems like there is a file with a name in hebrew (unicode):

#### CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='replace', tzinfo=None) 4 image_database.$cmd SON([('count', 'images'), ('query', {'_id': "/home/cmp/all_our_pics/15-04-2017 Maytal's phone/\udcd7\udc9e\udcd7\udca1\udcd7\udc9e\udcd7\udc9b\udcd7\udc99\udcd7\udc9d/20151015_130430.jpg"}), ('lsid', {'id': Binary(b'\x8c&\xf7\x96\xcf\xf0OZ\xb95\xaaf+\xd2\x91\x98', 4)})]) False```

the printing line: ```print("####", str(codec_options), str(flags), str(ns), str(spec), str(check_keys))```
was added to: 
/usr/local/lib/python3.5/dist-packages/pymongo/network.py
line 101.

As you can see the "undecodable" char is unicode but i *CAN* be encoeded as utf-8 using python. was done by my scripts in the past. Not sure what the issue here is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant