Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul of the architecture #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
219 changes: 195 additions & 24 deletions kaitaistruct.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
import typing
import itertools
import sys
import struct
from io import open, BytesIO, SEEK_CUR, SEEK_END # noqa
from io import IOBase, BufferedIOBase
import mmap
from pathlib import Path
from abc import ABC, abstractmethod

PY2 = sys.version_info[0] == 2

Expand All @@ -24,51 +29,216 @@
# pylint: disable=useless-object-inheritance,super-with-arguments,consider-using-f-string


class KaitaiStruct(object):
def __init__(self, stream):
self._io = stream
class _NonClosingNonParsingKaitaiStruct:
__slots__ = ("_io", "_parent", "_root")

def __init__(self, _io: "KaitaiStream", _parent: typing.Optional["_NonClosingNonParsingKaitaiStruct"] = None, _root: typing.Optional["_NonClosingNonParsingKaitaiStruct"] = None):
self._io = _io
self._parent = _parent
self._root = _root if _root else self


class NonClosingKaitaiStruct(_NonClosingNonParsingKaitaiStruct, ABC):
__slots__ = ()

@abstractmethod
def _read(self):
raise NotImplementedError()


class KaitaiStruct(NonClosingKaitaiStruct):
__slots__ = ("_shouldExit",)
def __init__(self, io: typing.Union["KaitaiStream", Path, bytes, str]):
if not isinstance(io, KaitaiStream):
io = KaitaiStream(io)
super.__init__(io)
self._shouldExit = False

def __enter__(self):
self._shouldExit = not self.stream.is_entered
if self._shouldExit:
self._io.__enter__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should assume that __enter__ on IO streams never does anything - it would simplify our own __enter__ implementations a lot. I think this is a safe assumption, because you can always use streams with manual close calls instead of with.

Copy link
Contributor Author

@KOLANICH KOLANICH Jan 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO - close should be removed in favour of __exit__ and constructing shouldn't mean __enter__ing in python 4 for some built-in types. Yes, I am mad enough to assumme that there will be python 4 fixing the mistakes of python 3 somewhen. And I sincerely hope it will happen. In fact I have a wishlist of the changes it is impossible to implement without breaking compatibility I wanna see in python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO - close should be removed in favour of __exit__ and constructing shouldn't mean __enter__ing in python 4 for some built-in types.

That's unlikely to happen. The general convention in the stdlib seems to be that you're never forced to use with (or __enter__/__exit__), because there are always other ways to do what with would do (such as close on streams).

Yes, I am mad enough to assumme that there will be python 4 fixing the mistakes of python 3 somewhen. And I sincerely hope it will happen. In fact I have a wishlist of the changes it is impossible to implement without breaking compatibility I wanna see in python.

That will never happen. I can't find a quote for this right now, but the core devs have said that Python 4 won't be a big release that breaks everything like Python 3. I don't think that the devs (or the community) are interested in another messy incompatible version switch like Python 2 to 3 was.

return self

def __exit__(self, *args, **kwargs):
self.close()
if self.shouldExit:
self._io.__exit__(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comment about __enter__ above, I think it's safe to assume that __exit__ on IO streams is equivalent to calling close.


def close(self):
self._io.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why KaitaiStruct.close should be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have __exit__.

@classmethod
def from_any(cls, o: typing.Union[Path, str]) -> "KaitaiStruct":
with KaitaiStream(o) as io:
s = cls(io)
s._read()
return s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with specs that use pos instances? Those are compiled to lazy properties, which read their data from the stream only once they are accessed. AFAICT this would break if the file is closed early like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for these case explicit context management is needed.

with OurStructName(KaitaiStream(Path("./aaa"))) as s:
    doNeededStuff(s)

Or maybe I even need to try to construct the stream automatically at the cost of an additional check.


@classmethod
def from_file(cls, filename):
f = open(filename, 'rb')
try:
return cls(KaitaiStream(f))
except Exception:
# close file descriptor, then reraise the exception
f.close()
raise
def from_file(cls, file: typing.Union[Path, str, BufferedIOBase], use_mmap: bool = True) -> "KaitaiStruct":
return cls.from_any(file, use_mmap=use_mmap)

@classmethod
def from_bytes(cls, buf):
return cls(KaitaiStream(BytesIO(buf)))
def from_bytes(cls, data: bytes) -> "KaitaiStruct":
return cls.from_any(data)

@classmethod
def from_io(cls, io):
return cls(KaitaiStream(io))
KOLANICH marked this conversation as resolved.
Show resolved Hide resolved
def from_io(cls, io: IOBase) -> "KaitaiStruct":
return cls.from_any(io)


class IKaitaiDownStream(ABC):
__slots__ = ("_io",)

def __init__(self, _io: typing.Any):
self._io = _io

@property
@abstractmethod
def is_entered(self):
raise NotImplementedError

@abstractmethod
def __enter__(self):
raise NotImplementedError()

def __exit__(self, *args, **kwargs):
if self.is_entered:
self._io.__exit__(*args, **kwargs)
self._io = None


class KaitaiStream(object):
def __init__(self, io):
class KaitaiIODownStream(IKaitaiDownStream):
__slots__ = ()

def __init__(self, data: typing.Any):
super().__init__(data)

@property
def is_entered(self):
return isinstance(self._io, IOBase)

def __enter__(self):
if not self.is_entered:
self._io = open(self._io).__enter__()
return self


class KaitaiBytesDownStream(KaitaiIODownStream):
__slots__ = ()

def __init__(self, data: bytes):
super().__init__(data)


class KaitaiFileSyscallDownStream(KaitaiIODownStream):
__slots__ = ()

def __init__(self, io: typing.Union[Path, str, IOBase]):
if isinstance(io, str):
io = Path(io)
super().__init__(io)


class KaitaiRawMMapDownStream(KaitaiIODownStream):
__slots__ = ()

def __init__(self, io: typing.Union[mmap.mmap]):
super().__init__(None)
self._io = io
self.align_to_byte()

@property
def is_entered(self):
return isinstance(self._io, mmap.mmap)

def __enter__(self):
return self

def __exit__(self, *args, **kwargs):
super().__exit__(*args, **kwargs)


class KaitaiFileMapDownStream(KaitaiRawMMapDownStream):
__slots__ = ("file",)

def __init__(self, io: typing.Union[Path, str, IOBase]):
super().__init__(None)
self.file = KaitaiFileSyscallDownStream(io)

@property
def is_entered(self):
return isinstance(self._io, mmap.mmap)

def __enter__(self):
self.file = self.file.__enter__()
self._io = mmap.mmap(self.file.file.fileno(), 0, access=mmap.ACCESS_READ).__enter__()
return self

def __exit__(self, *args, **kwargs):
self.close()
super().__exit__(*args, **kwargs)
if self.file is not None:
self.file.__exit__(*args, **kwargs)
self.file = None


def get_file_down_stream(path: Path, *args, use_mmap: bool = True, **kwargs) -> IKaitaiDownStream:
if use_mmap:
cls = KaitaiFileMapDownStream
else:
cls = KaitaiFileSyscallDownStream

return cls(path, *args, **kwargs)


def get_mmap_downstream(mapping: mmap.mmap):
return KaitaiRawMMapDownStream(mapping)


def close(self):
self._io.close()
downstreamMapping = {
bytes: KaitaiBytesDownStream,
BytesIO: KaitaiBytesDownStream,
str: get_file_down_stream,
Path: get_file_down_stream,
BufferedIOBase: get_file_down_stream,
mmap.mmap: get_mmap_downstream,
}


def get_downstream_ctor(t) -> typing.Type[IKaitaiDownStream]:
ctor = downstreamMapping.get(t, None)
if ctor:
return ctor
types = t.mro()
for t1 in types[1:]:
ctor = downstreamMapping.get(t1, None)
if ctor:
downstreamMapping[t] = ctor
return ctor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just isinstance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complexity. Doing isinstance would mean we would have to walk all the types and check each, this is O(n) and maybe more depending on inheritance impl. I have tried to optimize it by a lookup in a dict that should be O(1).

raise TypeError("Unsupported type", t, types)


def get_downstream(x: typing.Union[bytes, str, Path], *args, **kwargs) -> IKaitaiDownStream:
return get_downstream_ctor(type(x))(x, *args, **kwargs)


class KaitaiStream():
def __init__(self, o: typing.Union[bytes, str, Path, IKaitaiDownStream]):
if not isinstance(o, IKaitaiDownStream):
o = get_downstream(o)
self._downstream = o
self.align_to_byte()

@property
def _io(self):
return self._downstream._io

def __enter__(self):
self._downstream.__enter__()
return self

@property
def is_entered(self):
return self._downstream is not None and self._downstream.is_entered

def __exit__(self, *args, **kwargs):
self._downstream.__exit__(*args, **kwargs)

# region Stream positioning

Expand Down Expand Up @@ -454,6 +624,7 @@ class KaitaiStructError(Exception):
Stores KSY source path, pointing to an element supposedly guilty of
an error.
"""

def __init__(self, msg, src_path):
super(KaitaiStructError, self).__init__("%s: %s" % (src_path, msg))
self.src_path = src_path
Expand Down