Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul of the architecture #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KOLANICH
Copy link
Contributor

started using memory-mapping files and added from_any method

@KOLANICH KOLANICH force-pushed the stream_improvements branch 2 times, most recently from 2b0acfc to ea75499 Compare March 22, 2018 18:23
@KOLANICH
Copy link
Contributor Author

@GreyCat

kaitaistruct.py Outdated
return cls.from_file(Path(file), map)
elif isinstance(file, Path):
with file.open("rb") as f:
return cls.from_file(f, map)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're losing this "try-catch" logic that we had before, which attempted to close file after failure.

I'm by no mean a Python expert, so I'm not sure if that logic made sense in the first place. Can you explan why have you removed it?

Copy link
Contributor Author

@KOLANICH KOLANICH Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're losing this "try-catch" logic that we had before, which attempted to close file after failure.

A context manager (with) is used. It is a replacement for try except and is something like RAII: when the block of a context manager is exitted the resource in its header is __exit__ed. In fact context managers are present in python for pretty long time, they were introduced in python 2.5 in 2008. Explicit usage of try except for the cases where a context manager is needed is considered to be an antipattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall that it was tried once, but I had to revert it: 7d0b226 — have you checked if it works now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't check the tests since they are not in this repo.

Copy link
Contributor

@dgelessus dgelessus Jan 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case you actually cannot use with. In from_file you normally want the file to stay open so that the returned KaitaiStream can use it, and once the KaitaiStream is closed, the underlying file is also closed. But if an exception is thrown while creating the KaitaiStream, you need to closed the file manually, because then from_file doesn't return a KaitaiStream object that the user could use to close the file. with will always close the file at the end of the block, regardless of whether an exception is thrown, which is not the desired behavior here, so the manual try/except solution is actually necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right.

kaitaistruct.py Outdated Show resolved Hide resolved
@KOLANICH KOLANICH force-pushed the stream_improvements branch 3 times, most recently from 9ff92ab to d9742df Compare January 25, 2020 19:05
@KOLANICH KOLANICH changed the title Improved file reading Overhaul of the architecture Jan 25, 2020
@KOLANICH KOLANICH force-pushed the stream_improvements branch 2 times, most recently from 6e902e5 to 421d79c Compare January 25, 2020 19:36
@KOLANICH
Copy link
Contributor Author

I have tried to refactor the architecture of the runtime to make it composable. It would require some modifications to KSC.

  1. KaitaiStruct is a class for use my a user
  2. _KaitaiStruct is a class for use by a compiler. It skips some checks related to entrance.
  3. KaitaiParser is a class to encapsulate parsing logic.
  4. IKaitaiDownStream is an interface describing low-level interacton to streams. It manages the underlying resource.
  5. KaitaiStream inherits methods and props from classes implementing IKaitaiDownStream via passthrough.
  6. KaitaiStream manages IKaitaiDownStream
  7. KaitaiStream is managed by a user explicitly using context managers. If a user needs just parsing, he should use from_any. If a user needs serialization, he has to manage lifetime of the stream explicitly.

Copy link
Contributor

@dgelessus dgelessus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm coming a bit late to this PR, so maybe there's some context I'm missing. What's the motivation behind adding support for memory-mapped IO? Do you have a particular use case where the stream-based IO operations are a performance bottleneck? Also, as far as I can tell, the runtime uses the memory-mapped file exactly like a stream-based file - does memory-mapped IO offer any actual performance benefits this way?

The structure of the code is unfortunately not very clear to me, even with the explanations in #25 (comment). The code uses a number of ABCs, but with no documentation about why they exist, who is meant to extend/implement them, and what the expected behavior for their abstract methods is. More specifically:

  • What's the need for splitting KaitaiStruct into a hierarchy of three classes (_KaitaiStruct_, _KaitaiStruct, KaitaiStruct)? Is there any advantage to using the superclasses, which have less functionality than KaitaiStruct? Also, it's quite confusing to have three classes with identical names except for the number of underscores.
  • What's the need for IKaitaiDownStream and its implementations? All objects that it wraps (file objects, io.BytesIO, mmap.mmap) support normal Python stream operations, so there should be no need for a wrapper interface/ABC.
  • What's the purpose of KaitaiParser? It appears to be completely unimplemented.

Some general points about the code:

  • There are some blatant errors in the code that make it impossible to import, let alone run correctly. I know you said that changes in the compiler would be needed to use this new version of the runtime, so I can understand that you haven't run a full test suite over this. But please do some basic testing before you submit this as a PR and ask for changes in the compiler. (Or if the PR is not yet finished, please mark it as WIP somehow.)
  • This PR breaks compatibility with Python 3.4 and older, including Python 2.7. (typing was added to the stdlib in 3.5, pathlib was added in 3.4, function type annotation syntax was added in 3.0). I don't know if dropping support for 3.4 and older is acceptable for KS, it would depend on what versions are frequently used in the KS userbase. In any case, I think this should be discussed and implemented separately.
  • This PR includes type hints in some places, but not all. If we decide to use type hints, they should be applied consistently everywhere, but again I think that should be a separate PR.
  • The explanations from Overhaul of the architecture #25 (comment) should be included in the code as comments/docstrings in the relevant places. Future readers of the code shouldn't have to read a PR discussion to understand the architecture.
  • Many identifiers use dromedaryCase, which is not the usual Python naming convention.

kaitaistruct.py Outdated
import itertools
import sys
import struct
from io import open, BytesIO, SEEK_CUR, SEEK_END # noqa
from _io import _IOBase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't import _io._IOBase (or anything else from _io), it's an undocumented implementation detail with no guaranteed behavior. The public documented class to use here is io.IOBase.

Copy link
Contributor Author

@KOLANICH KOLANICH Jan 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path("./a.txt").open("rb").__class__.mro() [<class '_io.BufferedReader'>, <class '_io._BufferedIOBase'>, <class '_io._IOBase'>, <class 'object'>]

I don't see io.IOBase in __mro__, it is probably not inherited from it. In fact it is io.IOBase inherited from _io._IOBase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment below - io.IOBase, io.RawIOBase, and io.BufferedIOBase are ABCs and have the standard stream types registered as virtual subclasses. So even if io.IOBase doesn't appear in the MRO, isinstance(Path(...).open("rb"), io.IOBase) will return True.

kaitaistruct.py Outdated
class KaitaiStruct(object):
def __init__(self, stream):
self._io = stream
class _KaitaiStruct_:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it's quite confusing to have classes _KaitaiStruct_ and _KaitaiStruct. Also, a single trailing underscore is quite unusual in a Python identifier, especially in a class name.

Copy link
Contributor Author

@KOLANICH KOLANICH Jan 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any ideas how to better name it? Underscores are usual convention for marking something as use with care, so _KaitaiStruct_ is double-use with care, but I shouldn't prepend a double underscore because it is reserved in pythonk so I have appended it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscores are the usual convention for internal things that you're not really supposed to touch as an external user, so it's not a good naming convention for classes that users are meant to subclass...

Why not KaitaiStructBase for the base class, and something like NonClosingKaitaiStruct for the other class?

kaitaistruct.py Outdated
_parser = None # :KaitaiParser

def _read(self):
self.__class__._parser.parse(self, self._io)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where/when/how is _parser supposed to be set? Currently it's always None, so this is guaranteed to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is meant to be set in the code generated by KSC.


def __enter__(self):
self._shouldExit = not stream.is_entered
if self._shouldExit:
self._io.__enter__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should assume that __enter__ on IO streams never does anything - it would simplify our own __enter__ implementations a lot. I think this is a safe assumption, because you can always use streams with manual close calls instead of with.

Copy link
Contributor Author

@KOLANICH KOLANICH Jan 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO - close should be removed in favour of __exit__ and constructing shouldn't mean __enter__ing in python 4 for some built-in types. Yes, I am mad enough to assumme that there will be python 4 fixing the mistakes of python 3 somewhen. And I sincerely hope it will happen. In fact I have a wishlist of the changes it is impossible to implement without breaking compatibility I wanna see in python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO - close should be removed in favour of __exit__ and constructing shouldn't mean __enter__ing in python 4 for some built-in types.

That's unlikely to happen. The general convention in the stdlib seems to be that you're never forced to use with (or __enter__/__exit__), because there are always other ways to do what with would do (such as close on streams).

Yes, I am mad enough to assumme that there will be python 4 fixing the mistakes of python 3 somewhen. And I sincerely hope it will happen. In fact I have a wishlist of the changes it is impossible to implement without breaking compatibility I wanna see in python.

That will never happen. I can't find a quote for this right now, but the core devs have said that Python 4 won't be a big release that breaks everything like Python 3. I don't think that the devs (or the community) are interested in another messy incompatible version switch like Python 2 to 3 was.

def close(self):
self._io.close()
if self.shouldExit:
self._io.__exit__(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comment about __enter__ above, I think it's safe to assume that __exit__ on IO streams is equivalent to calling close.

with KaitaiStream(o) as io:
s = cls(io)
s._read()
return s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with specs that use pos instances? Those are compiled to lazy properties, which read their data from the stream only once they are accessed. AFAICT this would break if the file is closed early like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for these case explicit context management is needed.

with OurStructName(KaitaiStream(Path("./aaa"))) as s:
    doNeededStuff(s)

Or maybe I even need to try to construct the stream automatically at the cost of an additional check.

kaitaistruct.py Show resolved Hide resolved
kaitaistruct.py Outdated Show resolved Hide resolved
kaitaistruct.py Outdated

@property
def is_entered(self):
return isinstance(self._io, _IOBase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost guaranteed to not work, because most built-in IO streams don't really extend _io._IOBase. You need to use io.IOBase instead - that will work, because it is an ABC and has all built-in IO stream classes registered as virtual subclasses.

Copy link
Contributor Author

@KOLANICH KOLANICH Jan 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path("./a.txt").open("rb").__class__.mro()
[<class '_io.BufferedReader'>, <class '_io._BufferedIOBase'>, <class '_io._IOBase'>, <class 'object'>]

also

BytesIO.mro()
[<class '_io.BytesIO'>, <class '_io._BufferedIOBase'>, <class '_io._IOBase'>, <class 'object'>]

on 3.7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I remembered that incorrectly - apparently the streams from _io do actually extend _io._IOBase.

In any case, _io is not a documented module, so there's no guarantee that _io._IOBase will be used that way in the future. io.IOBase is documented and does exactly the same thing here, so you should really use that instead.

kaitaistruct.py Outdated


def get_downstream(x: typing.Union[bytes, str, Path], *args, **kwargs) -> IKaitaiDownStream:
return downstreamMapping[type(x)](x, *args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work correctly for multiple reasons:

  • On Unix systems, Python accepts both str and bytes as paths, even in Python 3. This is to allow working with paths that have no consistent encoding. This isn't handled correctly here - the bytes path will instead be treated as the contents of a stream.
  • The lookup will fail if type(x) is a subclass of one of the types listed in the mapping. This is unlikely to be a problem for bytes and str, but is guaranteed to break for Path, because there are no instances of Path itself (only of its subclasses, WindowsPath or PosixPath).
  • There is no way to create an IKaitaiDownStream from an existing IO stream, which means there's no way to implement KaitaiStream.from_io based on this function.

Copy link
Contributor Author

@KOLANICH KOLANICH Jan 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Unix systems, Python accepts both str and bytes as paths, even in Python 3. This is to allow working with paths that have no consistent encoding.

Thanks. Haven't known about this fact.

This isn't handled correctly here - the bytes path will instead be treated as the contents of a stream.

How are we going to address it? Path doesn't seem to support bytes at all, so we can consider it as dropped. Should they be decoded into strings somehow?

The lookup will fail if type(x) is a subclass of one of the types listed in the mapping. This is unlikely to be a problem for bytes and str, but is guaranteed to break for Path, because there are no instances of Path itself (only of its subclasses, WindowsPath or PosixPath).

Path("./a.txt").__class__.mro()

[pathlib.PosixPath,
 pathlib.Path,
 pathlib.PurePosixPath,
 pathlib.PurePath,
 object]

Will be fixed.

  • There is no way to create an IKaitaiDownStream from an existing IO stream, which means there's no way to implement KaitaiStream.from_io based on this function.

It's by design. If one needs a stream, he must write a wrapper implementing IKaitaiDownStream.

@KOLANICH
Copy link
Contributor Author

KOLANICH commented Jan 26, 2020

I'm coming a bit late to this PR, so maybe there's some context I'm missing. What's the motivation behind adding support for memory-mapped IO?

Mainly:

  • theoretical possibility to work with files larger than physical RAM.
  • serialization support.

All of it would require major changes in the code generated by the KSC and the runtime. But they are not part of this PR, let's do it step by step.

The plan is to store only the metadata in additional memory and do all the operations on the mapped memory directly. So when reading for example a u1 we just calc its offset in the map and read it from there. When writing we just write by the offset. The OS automatically swaps pages.

Of course it is suitable not for all use cases so a user should be able to choose which IO model he needs.

Do you have a particular use case where the stream-based IO operations are a performance bottleneck? Also, as far as I can tell, the runtime uses the memory-mapped file exactly like a stream-based file - does memory-mapped IO offer any actual performance benefits this way?

I haven't done any measurements about this in fact.

* What's the need for splitting `KaitaiStruct` into a hierarchy of three classes (`_KaitaiStruct_`, `_KaitaiStruct`, `KaitaiStruct`)? Is there any advantage to using the superclasses, which have less functionality than `KaitaiStruct`? Also, it's quite confusing to have three classes with identical names except for the number of underscores.
* What's the purpose of `KaitaiParser`? It appears to be completely unimplemented.

_KaitaiStruct doesn't have checks on openness, KaitaiStruct does. Both use composition for a parser.
The idea is following:

  1. KSC implements KaitaiParser with parsing code that is currently in _read.
  2. KSC inherits either KaitaiStruct, setting its _parser to the generated parser and also creating props and __slots__. The parsing and serialization machinery goes to KaitaiParser-inherited class (we may need a better name), the storage and accessor machinery goes to _KaitaiStruct-inherited class.
  3. KSC uses _KaitaiStruct (via super()) where no openness checks are needed, such as parsing struct within seq fields.
  4. _KaitaiStruct_ is meant to be used by user-implemented opaque types when no machinery of _KaitaiStruct is needed.
* What's the need for `IKaitaiDownStream` and its implementations? All objects that it wraps (file objects, `io.BytesIO`, `mmap.mmap`) support normal Python stream operations, so there should be no need for a wrapper interface/ABC.

To support context management properly.

  • Python streams have a flaw because of compatibility to python 2 - they are entered when constructed. The wrappers enter them in __enter__ by delaying their construction till that moment, if that is needed. * It also provides a magic to detect wheter an object is __enter__ed.
  • And a magic to make sure it is __exit__ed when deleted.

Some general points about the code:

* There are some blatant errors in the code that make it impossible to import, let alone run correctly. I know you said that changes in the compiler would be needed to use this new version of the runtime, so I can understand that you haven't run a full test suite over this. But please do some basic testing before you submit this as a PR and ask for changes in the compiler. (Or if the PR is not yet finished, please mark it as WIP somehow.)
* The explanations from [#25 (comment)](https://github.com/kaitai-io/kaitai_struct_python_runtime/pull/25#issuecomment-578437146) should be included in the code as comments/docstrings in the relevant places. Future readers of the code shouldn't have to read a PR discussion to understand the architecture.

I currently consider this as a draft, it will likely be changed a lot. I have pushed it to just get your feedback about the proposed architecture.

* Many identifiers use dromedaryCase, which is not the usual Python naming convention.

Don't see any problem in fixing it.

* This PR breaks compatibility with Python 3.4 and older, including Python 2.7. (`typing` was added to the stdlib in 3.5, `pathlib` was added in 3.4, function type annotation syntax was added in 3.0).

It is officially EOL. Typing is added for clarity, this PR adds a lot of complexity, so it also need something for clarity in order not to get mad. I can remove the typing if 2.7 support is still needed. Or we can try to use the stuff like 3to2 for that.

* This PR includes type hints in some places, but not all. If we decide to use type hints, they should be applied consistently everywhere, but again I think that should be a separate PR.

Yes, I gonna use monkeytype for that. They can be added later.

@dgelessus
Copy link
Contributor

  • theoretical possibility to work with files larger than physical RAM.

I don't have a lot of experience with mmap in Python, but from what I've read, the data from the file is copied by default. This happens regardless of whether you use the bytearray-like interface (f[12:34]) or the IO-stream-like interface (f.read(123)). The reason for this is probably that the data is returned as a bytes object, which is supposed to be immutable, so the data has to be copied to ensure that it cannot change (which could happen if the mmap file is opened as writable and you write to a position that you previously read from).

To prevent this automatic copying behavior and save memory as you intended, I think you need to wrap the mmap file in a memoryview, which lets you slice the memory without copying it. Unfortunately memoryview doesn't support the usual Python IO stream methods (unlike mmap), so you'd probably have to write a wrapper class that adds an IO stream interface on top of memoryview. (io.BytesIO doesn't work here - you can pass it a memoryview object, but that copies its contents and doesn't reuse the memory.)

_KaitaiStruct doesn't have checks on openness, KaitaiStruct does.

  1. KSC uses _KaitaiStruct (via super()) where no openness checks are needed, such as parsing struct within seq fields.

Do you mean that if a type has only seq fields and no instances, it would not support __enter__/__exit__? I'm not sure if that's a good idea - this way there would be some types that require with to be used properly, and others that cannot use with.

  1. _KaitaiStruct_ is meant to be used by user-implemented opaque types when no machinery of _KaitaiStruct is needed.

I assume this would be optional? Because currently opaque types don't require any specific base class. Also, I'm not sure if the constructor signature is correct - as far as I can tell, opaque types only receive an _io argument, and never _parent and _root.

  1. KSC implements KaitaiParser with parsing code that is currently in _read.
  2. KSC inherits either KaitaiStruct, setting its _parser to the generated parser and also creating props and __slots__. The parsing and serialization machinery goes to KaitaiParser-inherited class (we may need a better name), the storage and accessor machinery goes to _KaitaiStruct-inherited class.

OK, but what's the reason for separating the parser from the struct? The two are still tightly coupled - each parser can only work with one struct type, and you cannot create a struct object without a parser.

  • Python streams have a flaw because of compatibility to python 2 - they are entered when constructed.

I don't think it's a compatibility thing... the open function just does exactly what the name says :)

The wrappers enter them in __enter__ by delaying their construction till that moment, if that is needed.

I don't think this behavior is a good idea. It's different from how every other IO stream in Python works. There's also no good way to use these classes without with, which is still necessary in some cases, and in the REPL it's just more convenient.

ctor = downstreamMapping.get(t1, None)
if ctor:
downstreamMapping[t] = ctor
return ctor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just isinstance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complexity. Doing isinstance would mean we would have to walk all the types and check each, this is O(n) and maybe more depending on inheritance impl. I have tried to optimize it by a lookup in a dict that should be O(1).

@KOLANICH
Copy link
Contributor Author

Do you mean that if a type has only seq fields and no instances, it would not support enter/exit?

No, I mean either the top type manages the KaitaiStream or programmer-written code manages it. All the subtypes are meant to be used only within the top one. So a context of a KaitaiStream is meant to be already entered when using any of subtypes, so they don't need an own context manager.

others that cannot use with.

If we deal with a resource that have to be explicitly managed we must use either context management magics or their surrogates (i.e. open, close), the alternative is resource leak. If one doesn't want with he can call the magics explicitly and take care of exceptions himself.

There's also no good way to use these classes without with

It's a feature.

as far as I can tell, opaque types only receive an _io argument, and never _parent and _root.

IDK what to answer now, needs some testing.

OK, but what's the reason for separating the parser from the struct?

No strong reasons, just "composition over inheritance" mindset.

don't think it's a compatibility thing... the open function just does exactly what the name says :)

It is a compatibility thing - keeping open instead of replacing it with File.

@dgelessus
Copy link
Contributor

No, I mean either the top type manages the KaitaiStream or programmer-written code manages it. All the subtypes are meant to be used only within the top one. So a context of a KaitaiStream is meant to be already entered when using any of subtypes, so they don't need an own context manager.

OK, I understand what you mean, but I don't think we can differentiate between these two cases at the type level. A top-level type isn't always the type that manages the IO stream (for example if the type is imported and used by another spec), and the type that manages the IO stream isn't always the top-level type of a spec (you can directly use non-top-level types in your application code, e. g. the_spec.TheSpec.SomeHelperType.from_bytes(b"...")). You would need to handle this at the object level, possibly with an extra constructor parameter to indicate whether the KaitaiStruct in question "owns" the IO stream.

In general I think this would be a good idea though, as it would prevent users from accidentally withing or closeing a KaitaiStruct object that doesn't own the stream that it uses.

There's also no good way to use these classes without with

It's a feature.

I understand that you want to make people do proper resource management, but removing close is not the way to do that, because it makes it harder to do proper resource management in cases where with is not an option. It's much cleaner to write thing.close() than thing.__exit__(None, None, None).

It is a compatibility thing - keeping open instead of replacing it with File.

If the Python devs wanted to change that, they would have done so with the io module that was introduced with Python 3 (and then backported to Python 2.6 and 2.7), or with pathlib that was introduced in Python 3.4. But they didn't - in both cases they kept the behavior of the old open function.

In any case, Python's IO stream behavior is what it is - streams are fully initialized/opened/working when they are created, __enter__ does nothing, and both close and __exit__ close the stream forever. I think KS should follow that standard behavior, unless there's a good reason to implement things differently than every other Python library out there.

@KOLANICH
Copy link
Contributor Author

A top-level type isn't always the type that manages the IO stream (for example if the type is imported and used by another spec)

Yes and no. KaitaiStruct checks if the stream is opened, if it is, it means that someone else manages it. If it isn't than the struct manages it.

and the type that manages the IO stream isn't always the top-level type of a spec (you can directly use non-top-level types in your application code, e. g. the_spec.TheSpec.SomeHelperType.from_bytes(b"..."))

Yes. That's why the compiler should inherit everythjng from KaitaiStruct but use the internal stuff via super() (not sure if it is possible to construct it this way) in order not to call full KaitaiStruct.__init__. Another concern here is that super() may have more overhead than the overhead related to full KaitaiStruct. It has to be measured, I cannot even estimate the chances.

but removing close is not the way to do that, because it makes it harder to do proper resource management in cases where with is not an option. It's much cleaner to write thing.close() than thing.exit(None, None, None).

IMHO, magics cleaner, they make it explicit that we are dealing with a context manager and that we are doing something needing care. The Nones can be defaulted.

I think KS should follow that standard behavior, unless there's a good reason to implement things differently than every other Python library out there.

The whole point of context managers is to get rid of manual freeing. If we make the type to __enter__ on creation, this would mean that anuser would have to do it himself.

BTW: should we allow the type to be __entered__ after it has been __exitted__? IMHO we should.

dgelessus added a commit to dgelessus/kaitai_struct_doc that referenced this pull request Oct 30, 2020
* Added instructions on how to install the tools needed to build and
  publish the Python runtime.
* Fixed formatting of __version__ variable name (the double underscores
  were previously recognized as Asciidoc italics syntax).
* Added step for deleting old built distributions before building new
  ones, so that files left over from previous builds can't be uploaded
  by accident.
* Added build of wheel in addition to source distribution to the build
  step (see kaitai-io/kaitai_struct_python_runtime#25).
* Added `twine check` step to perform basic checks on the distributions
  before uploading.
* Changed upload instructions from the deprecated `setup.py upload`
  command to the now recommended `twine upload`.
* Updated instructions for providing credentials when uploading. They
  now no longer need to be stored in an insecure plain text file.
* Updated PyPI URLs.
* Changed `git push` command so that only the just created tag is
  pushed, and not any other tags that might exist locally.
@KOLANICH KOLANICH force-pushed the stream_improvements branch 2 times, most recently from 2e4d3be to bf4ff4f Compare December 17, 2020 12:28
@KOLANICH KOLANICH force-pushed the stream_improvements branch 2 times, most recently from 1830da3 to 1f6b231 Compare April 12, 2022 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants