Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for complex number datatypes #4630

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

jhendersonHDF
Copy link
Collaborator

Adds the new datatype class H5T_COMPLEX

Adds the new API function H5Tcomplex_create which creates a complex number datatype from an ID of a base floating-point datatype

Adds the new feature check macros H5_HAVE_COMPLEX_NUMBERS and H5_HAVE_C99_COMPLEX_NUMBERS

Adds the new datatype size macros H5_SIZEOF_FLOAT_COMPLEX, H5_SIZEOF_DOUBLE_COMPLEX and H5_SIZEOF_LONG_DOUBLE_COMPLEX

Adds the new datatype ID macros H5T_NATIVE_FLOAT_COMPLEX, H5T_NATIVE_DOUBLE_COMPLEX, H5T_NATIVE_LDOUBLE_COMPLEX, H5T_CPLX_IEEE_F16LE, H5T_CPLX_IEEE_F16BE,
H5T_CPLX_IEEE_F32LE, H5T_CPLX_IEEE_F32BE,
H5T_CPLX_IEEE_F64LE and H5T_CPLX_IEEE_F64BE

Adds hard and soft datatype conversion paths between complex number datatypes and all the integer and floating-point datatypes, as well as between other complex number datatypes

Adds a special conversion path between complex number datatypes and array or compound datatypes where the in-memory layout of data is the same between the datatypes and data can be converted directly

Adds support for complex number datatypes to the h5dump, h5ls and h5diff/ph5diff tools. Allows h5dump '-m' option to change floating-point printing format for float complex and double complex datatypes, as well as long double complex if it has the same size as double complex

Adds minimal support to the h5watch and h5import tools

Adds support for the predefined complex number datatypes and H5Tcomplex_create function to the Java wrappers. Also adds initial, untested support to the JNI for future use with HDFView

Adds support for just the H5T_COMPLEX datatype class to the Fortran wrappers

Adds support for the predefined complex number datatypes and H5Tcomplex_create function to the high level library H5LT interface for use with the H5LTtext_to_dtype and H5LTdtype_to_text functions

Changes some usages of "complex" in the library since it conflicts with the "complex" keyword from the complex.h header. Also changes various usages of the word "complex" throughout the library to distinguish compound datatypes from complex datatypes.

Adds the new datatype class H5T_COMPLEX

Adds the new API function H5Tcomplex_create which creates a complex
number datatype from an ID of a base floating-point datatype

Adds the new feature check macros H5_HAVE_COMPLEX_NUMBERS and
H5_HAVE_C99_COMPLEX_NUMBERS

Adds the new datatype size macros H5_SIZEOF_FLOAT_COMPLEX,
H5_SIZEOF_DOUBLE_COMPLEX and H5_SIZEOF_LONG_DOUBLE_COMPLEX

Adds the new datatype ID macros H5T_NATIVE_FLOAT_COMPLEX,
H5T_NATIVE_DOUBLE_COMPLEX, H5T_NATIVE_LDOUBLE_COMPLEX,
H5T_CPLX_IEEE_F16LE, H5T_CPLX_IEEE_F16BE,
H5T_CPLX_IEEE_F32LE, H5T_CPLX_IEEE_F32BE,
H5T_CPLX_IEEE_F64LE and H5T_CPLX_IEEE_F64BE

Adds hard and soft datatype conversion paths between complex number
datatypes and all the integer and floating-point datatypes, as well as
between other complex number datatypes

Adds a special conversion path between complex number datatypes and
array or compound datatypes where the in-memory layout of data is the
same between the datatypes and data can be converted directly

Adds support for complex number datatypes to the h5dump, h5ls and
h5diff/ph5diff tools. Allows h5dump '-m' option to change floating-point
printing format for float complex and double complex datatypes, as well
as long double complex if it has the same size as double complex

Adds minimal support to the h5watch and h5import tools

Adds support for the predefined complex number datatypes and
H5Tcomplex_create function to the Java wrappers. Also adds initial,
untested support to the JNI for future use with HDFView

Adds support for just the H5T_COMPLEX datatype class to the Fortran
wrappers

Adds support for the predefined complex number datatypes and
H5Tcomplex_create function to the high level library H5LT interface
for use with the H5LTtext_to_dtype and H5LTdtype_to_text functions

Changes some usages of "complex" in the library since it conflicts with
the "complex" keyword from the complex.h header. Also changes various
usages of the word "complex" throughout the library to distinguish
compound datatypes from complex datatypes.
@jhendersonHDF jhendersonHDF added Merge - Develop Only This cannot be merged to previous versions of HDF5 (file format or breaking API changes) Priority - 0. Blocker ⛔ This MUST be merged for the release to happen Component - C Library Core C library issues (usually in the src directory) Component - Tools Command-line tools like h5dump, includes high-level tools Component - High-Level Library Code in the hl directory Component - Documentation Doxygen, markdown, etc. Component - Java Java wrappers Component - Fortran Fortran wrappers Component - Testing Code in test or testpar directories, GitHub workflows Component - Build CMake, Autotools Type - New Feature Add a new API call, functionality, or tool Component - Misc Anything else (CODEOWNERS, etc.) labels Jul 8, 2024
@jhendersonHDF
Copy link
Collaborator Author

Additional complex number features that haven't been implemented yet are mentioned in #4631.

if (c_classtype == H5T_OPAQUE) *classtype = H5T_OPAQUE_F;
if (c_classtype == H5T_COMPOUND) *classtype = H5T_COMPOUND_F;
if (c_classtype == H5T_REFERENCE) *classtype = H5T_REFERENCE_F;
if (c_classtype == H5T_ENUM) *classtype = H5T_ENUM_F;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look like this commented out section was useful, but can be put back.

@@ -3915,7 +3915,7 @@ test_nbit_compound_2(hid_t file)
unsigned int v;
char b[2][2];
atomic d[2][2];
} complex;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure had to be renamed since "complex" is a keyword from complex.h and conflicted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h5dump XML support was added and I verified it with a couple files, but no XML tests were added because the XML output always prints data (either dataset data or fill values) and long double/long double _Complex data displays differently across platforms due to sizeof(long double) == sizeof(double) with MSVC. h5dump would need some work for these tests to be portable.

*
* Return: void
*-------------------------------------------------------------------------
*/

static void
aux_tblinsert_filter(pack_opttbl_t *table, unsigned int I, filter_info_t filt)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to be changed since it is a keyword from the complex.h header and conflicted.

src/H5Tconv.c Show resolved Hide resolved
@@ -185,27 +177,47 @@ H5T_get_force_conv(const H5T_t *dt)
*-------------------------------------------------------------------------
*/
herr_t
H5T__reverse_order(uint8_t *rev, uint8_t *s, size_t size, H5T_order_t order)
H5T__reverse_order(uint8_t *rev, uint8_t *s, const H5T_t *dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -307,6 +348,23 @@ H5T__conv_order(const H5T_t *src, const H5T_t *dst, H5T_cdata_t *cdata,
} /* end if */
break;

case H5T_COMPLEX: {
const H5T_shared_t *src_base_sh = src->shared->parent->shared;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use H5T_cmp() here instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more compact, though H5T_cmp will compare the datatypes as different if they have different byte orders and then there's no information about whether or not that was the only thing different about the datatypes.

realval_type == H5T_CONV_FLOAT_SPECVAL_NEGINF ||
imagval_type == H5T_CONV_FLOAT_SPECVAL_POSINF ||
imagval_type == H5T_CONV_FLOAT_SPECVAL_NEGINF) {
H5T__bit_copy(d, dst_atomic.u.f.sign, s, src_atomic.u.f.sign, (size_t)1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you copy the global infinity values here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, as long as the type being converted matches one of the native types. Would probably make it a bit uglier here with special-case code and realistically this function is only ever going to be used for non-standard complex types like a complex of float16 anyway. I'm sure several of the library's software conversion functions could be heavily optimized like this, though I don't know if it's worth doing for this PR.

}
else {
/* There are many NaN values, so we just set all bits of the significand. */
if (realval_type == H5T_CONV_FLOAT_SPECVAL_NAN) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be worthwhile setting up globals for NANs also (in additional to infinities), so that memcpy() could be used here to speed things up.

}

if (real_zero) {
H5T__bit_copy(d, dst_atomic.u.f.sign, s, src_atomic.u.f.sign, (size_t)1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Globals for memcpy() would be good here too.

FUNC_LEAVE_NOAPI(ret_value)
} /* end H5T__conv_complex_loop() */

/*-------------------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this code to a 'static inline' routine in H5Tconv.h and invoke it from H5T__conv_f_f_loop and H5T__conv_complex_loop? I bet that the compiler will inline it correctly and eliminate any performance gap. Getting rid of the duplicated code would be very worthwhile from a maintenance perspective.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can probably do this. I must admit that I was a bit concerned about potential function call overhead in such a hot loop, but this function also isn't likely to be used very often unless applications make use of non-standard complex number types.

HGOTO_ERROR(H5E_DATATYPE, H5E_CANTCONVERT, FAIL, "unable to convert data values");
}
else {
/* If floating-point types are the same, use specialized loop */
Copy link
Contributor

@qkoziol qkoziol Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same for floating-point type conversion? Could the code be shared with them?

{
size_t type_size;

if (dst_p->shared->type == H5T_FLOAT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be hoisted out of the loop

goto padding;

if (except_ret == H5T_CONV_UNHANDLED)
H5T__bit_set(d, (dst_atomic.offset + dst_atomic.prec - 1), (size_t)1, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using the H5T__bit_set, etc. routines so heavily now, I suggest moving them into one of the H5T headers and making them 'static inline', to speed them up.

@@ -185,27 +177,47 @@ H5T_get_force_conv(const H5T_t *dt)
*-------------------------------------------------------------------------
*/
herr_t
H5T__reverse_order(uint8_t *rev, uint8_t *s, size_t size, H5T_order_t order)
H5T__reverse_order(uint8_t *rev, uint8_t *s, const H5T_t *dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this again, the function name is very misleading. It should be "H5T__to_le_order" or similar. If you are on a little-endian machine, it definitely doesn't reverse the order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and ideally we would be able to refactor the conversion functions to not have to byte-swap data into little-endian order in the first place, removing the need for reversing the buffer before calling the exception callback function.

src/H5Tpkg.h Show resolved Hide resolved
src/H5Tpkg.h Outdated Show resolved Hide resolved
tools/lib/h5tools.h Outdated Show resolved Hide resolved
"b" +4 IEEE 64-bit little-endian float
} 12 bytes
"b" +8 IEEE 64-bit little-endian float
} 16 bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that the file must have previously been generated on some system where the compiler used different structure padding.

Copy link
Contributor

@qkoziol qkoziol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing amount of great work here!

Four changes seem important to me:

  • Add the fuzzing guards on the complex datatype's encoding.
  • Add the version field to the complex datatype encoding, so that if we ever add heterogeneous complex types, we don't have to write as complex of a parser.
  • Switch H5Tset_size's behavior
  • Remove the 'homogeneous' flag from the complex datatype struct.

@jhendersonHDF
Copy link
Collaborator Author

* Add the fuzzing guards on the complex datatype's encoding.

The previous comment was made in the decode helper routine. Do we want checks both during encode and decode?

* Add the version field to the complex datatype encoding, so that if we ever add heterogeneous complex types, we don't have to write as complex of a parser.

What's the purpose of having a complex number-specific encoding version field here? Is it just to check that no heterogeneous complex types are allowed in, e.g. version 0?

@qkoziol
Copy link
Contributor

qkoziol commented Jul 10, 2024

* Add the fuzzing guards on the complex datatype's encoding.

The previous comment was made in the decode helper routine. Do we want checks both during encode and decode?

Sorry, I meant decoding.

* Add the version field to the complex datatype encoding, so that if we ever add heterogeneous complex types, we don't have to write as complex of a parser.

What's the purpose of having a complex number-specific encoding version field here? Is it just to check that no heterogeneous complex types are allowed in, e.g. version 0?

🤔 I worked through all the permutations I can think of again, and I think it'll be OK. I was concerned about the possible addition of another parent datatype for the heterogeneous case, but I think the existing flag will be OK for the forward/backward compatibility So, I think we can skip adding the version field.

@jhendersonHDF
Copy link
Collaborator Author

Most of the major previous comments should be addressed now. I haven't made the change to H5Tset_size yet as I still feel fairly strongly about that and I don't think we necessarily came to any real conclusion on it. There are various other comments on other possible optimizations or refactoring that could be made, but many of them seem like things that could fall outside this PR.

@derobins derobins added this to the 1.16.0 milestone Aug 15, 2024
@qkoziol
Copy link
Contributor

qkoziol commented Aug 26, 2024

Most of the major previous comments should be addressed now. I haven't made the change to H5Tset_size yet as I still feel fairly strongly about that and I don't think we necessarily came to any real conclusion on it. There are various other comments on other possible optimizations or refactoring that could be made, but many of them seem like things that could fall outside this PR.

Perhaps we should have a short, small audience meeting to discuss the H5Tset_size() issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - Build CMake, Autotools Component - C Library Core C library issues (usually in the src directory) Component - Documentation Doxygen, markdown, etc. Component - Fortran Fortran wrappers Component - High-Level Library Code in the hl directory Component - Java Java wrappers Component - Misc Anything else (CODEOWNERS, etc.) Component - Testing Code in test or testpar directories, GitHub workflows Component - Tools Command-line tools like h5dump, includes high-level tools Merge - Develop Only This cannot be merged to previous versions of HDF5 (file format or breaking API changes) Priority - 0. Blocker ⛔ This MUST be merged for the release to happen Type - New Feature Add a new API call, functionality, or tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants