Implement support for complex number datatypes #4630

jhendersonHDF · 2024-07-08T15:55:49Z

Adds the new datatype class H5T_COMPLEX

Adds the new API function H5Tcomplex_create which creates a complex number datatype from an ID of a base floating-point datatype

Adds the new feature check macros H5_HAVE_COMPLEX_NUMBERS and H5_HAVE_C99_COMPLEX_NUMBERS

Adds the new datatype size macros H5_SIZEOF_FLOAT_COMPLEX, H5_SIZEOF_DOUBLE_COMPLEX and H5_SIZEOF_LONG_DOUBLE_COMPLEX

Adds the new datatype ID macros H5T_NATIVE_FLOAT_COMPLEX, H5T_NATIVE_DOUBLE_COMPLEX, H5T_NATIVE_LDOUBLE_COMPLEX, H5T_CPLX_IEEE_F16LE, H5T_CPLX_IEEE_F16BE,
H5T_CPLX_IEEE_F32LE, H5T_CPLX_IEEE_F32BE,
H5T_CPLX_IEEE_F64LE and H5T_CPLX_IEEE_F64BE

Adds hard and soft datatype conversion paths between complex number datatypes and all the integer and floating-point datatypes, as well as between other complex number datatypes

Adds a special conversion path between complex number datatypes and array or compound datatypes where the in-memory layout of data is the same between the datatypes and data can be converted directly

Adds support for complex number datatypes to the h5dump, h5ls and h5diff/ph5diff tools. Allows h5dump '-m' option to change floating-point printing format for float complex and double complex datatypes, as well as long double complex if it has the same size as double complex

Adds minimal support to the h5watch and h5import tools

Adds support for the predefined complex number datatypes and H5Tcomplex_create function to the Java wrappers. Also adds initial, untested support to the JNI for future use with HDFView

Adds support for just the H5T_COMPLEX datatype class to the Fortran wrappers

Adds support for the predefined complex number datatypes and H5Tcomplex_create function to the high level library H5LT interface for use with the H5LTtext_to_dtype and H5LTdtype_to_text functions

Changes some usages of "complex" in the library since it conflicts with the "complex" keyword from the complex.h header. Also changes various usages of the word "complex" throughout the library to distinguish compound datatypes from complex datatypes.

Adds the new datatype class H5T_COMPLEX Adds the new API function H5Tcomplex_create which creates a complex number datatype from an ID of a base floating-point datatype Adds the new feature check macros H5_HAVE_COMPLEX_NUMBERS and H5_HAVE_C99_COMPLEX_NUMBERS Adds the new datatype size macros H5_SIZEOF_FLOAT_COMPLEX, H5_SIZEOF_DOUBLE_COMPLEX and H5_SIZEOF_LONG_DOUBLE_COMPLEX Adds the new datatype ID macros H5T_NATIVE_FLOAT_COMPLEX, H5T_NATIVE_DOUBLE_COMPLEX, H5T_NATIVE_LDOUBLE_COMPLEX, H5T_CPLX_IEEE_F16LE, H5T_CPLX_IEEE_F16BE, H5T_CPLX_IEEE_F32LE, H5T_CPLX_IEEE_F32BE, H5T_CPLX_IEEE_F64LE and H5T_CPLX_IEEE_F64BE Adds hard and soft datatype conversion paths between complex number datatypes and all the integer and floating-point datatypes, as well as between other complex number datatypes Adds a special conversion path between complex number datatypes and array or compound datatypes where the in-memory layout of data is the same between the datatypes and data can be converted directly Adds support for complex number datatypes to the h5dump, h5ls and h5diff/ph5diff tools. Allows h5dump '-m' option to change floating-point printing format for float complex and double complex datatypes, as well as long double complex if it has the same size as double complex Adds minimal support to the h5watch and h5import tools Adds support for the predefined complex number datatypes and H5Tcomplex_create function to the Java wrappers. Also adds initial, untested support to the JNI for future use with HDFView Adds support for just the H5T_COMPLEX datatype class to the Fortran wrappers Adds support for the predefined complex number datatypes and H5Tcomplex_create function to the high level library H5LT interface for use with the H5LTtext_to_dtype and H5LTdtype_to_text functions Changes some usages of "complex" in the library since it conflicts with the "complex" keyword from the complex.h header. Also changes various usages of the word "complex" throughout the library to distinguish compound datatypes from complex datatypes.

jhendersonHDF · 2024-07-08T16:07:09Z

Additional complex number features that haven't been implemented yet are mentioned in #4631.

jhendersonHDF · 2024-07-08T16:09:17Z

fortran/src/H5Tf.c

-      if (c_classtype == H5T_OPAQUE)    *classtype = H5T_OPAQUE_F;
-      if (c_classtype == H5T_COMPOUND)  *classtype = H5T_COMPOUND_F;
-      if (c_classtype == H5T_REFERENCE) *classtype = H5T_REFERENCE_F;
-      if (c_classtype == H5T_ENUM)      *classtype = H5T_ENUM_F;


Didn't look like this commented out section was useful, but can be put back.

release_docs/RELEASE.txt

jhendersonHDF · 2024-07-08T16:21:31Z

test/dsets.c

@@ -3915,7 +3915,7 @@ test_nbit_compound_2(hid_t file)
        unsigned int v;
        char         b[2][2];
        atomic       d[2][2];
-    } complex;


This structure had to be renamed since "complex" is a keyword from complex.h and conflicted

jhendersonHDF · 2024-07-08T16:26:17Z

tools/src/h5dump/h5dump_xml.c

h5dump XML support was added and I verified it with a couple files, but no XML tests were added because the XML output always prints data (either dataset data or fill values) and long double/long double _Complex data displays differently across platforms due to sizeof(long double) == sizeof(double) with MSVC. h5dump would need some work for these tests to be portable.

jhendersonHDF · 2024-07-08T16:27:01Z

tools/src/h5repack/h5repack_opttable.c

 *
 * Return: void
 *-------------------------------------------------------------------------
 */

 static void
-aux_tblinsert_filter(pack_opttbl_t *table, unsigned int I, filter_info_t filt)


I had to be changed since it is a keyword from the complex.h header and conflicted.

src/H5Tconv.c

qkoziol · 2024-07-10T16:30:54Z

src/H5Tconv.c

@@ -185,27 +177,47 @@ H5T_get_force_conv(const H5T_t *dt)
 *-------------------------------------------------------------------------
 */
 herr_t
-H5T__reverse_order(uint8_t *rev, uint8_t *s, size_t size, H5T_order_t order)
+H5T__reverse_order(uint8_t *rev, uint8_t *s, const H5T_t *dtype)


Good catch!

qkoziol · 2024-07-10T16:33:24Z

src/H5Tconv.c

@@ -307,6 +348,23 @@ H5T__conv_order(const H5T_t *src, const H5T_t *dst, H5T_cdata_t *cdata,
                    } /* end if */
                    break;

+                case H5T_COMPLEX: {
+                    const H5T_shared_t *src_base_sh = src->shared->parent->shared;


Maybe use H5T_cmp() here instead?

It would be more compact, though H5T_cmp will compare the datatypes as different if they have different byte orders and then there's no information about whether or not that was the only thing different about the datatypes.

src/H5Tconv_complex.c

qkoziol · 2024-07-10T16:59:38Z

src/H5Tconv_complex.c

+                    realval_type == H5T_CONV_FLOAT_SPECVAL_NEGINF ||
+                    imagval_type == H5T_CONV_FLOAT_SPECVAL_POSINF ||
+                    imagval_type == H5T_CONV_FLOAT_SPECVAL_NEGINF) {
+                    H5T__bit_copy(d, dst_atomic.u.f.sign, s, src_atomic.u.f.sign, (size_t)1);


Could you copy the global infinity values here?

Sure, as long as the type being converted matches one of the native types. Would probably make it a bit uglier here with special-case code and realistically this function is only ever going to be used for non-standard complex types like a complex of float16 anyway. I'm sure several of the library's software conversion functions could be heavily optimized like this, though I don't know if it's worth doing for this PR.

qkoziol · 2024-07-10T17:00:44Z

src/H5Tconv_complex.c

+                }
+                else {
+                    /* There are many NaN values, so we just set all bits of the significand. */
+                    if (realval_type == H5T_CONV_FLOAT_SPECVAL_NAN) {


Would probably be worthwhile setting up globals for NANs also (in additional to infinities), so that memcpy() could be used here to speed things up.

qkoziol · 2024-07-10T17:02:09Z

src/H5Tconv_complex.c

+        }
+
+        if (real_zero) {
+            H5T__bit_copy(d, dst_atomic.u.f.sign, s, src_atomic.u.f.sign, (size_t)1);


Globals for memcpy() would be good here too.

qkoziol · 2024-07-10T17:07:27Z

src/H5Tconv_complex.c

+    FUNC_LEAVE_NOAPI(ret_value)
+} /* end H5T__conv_complex_loop() */
+
+/*-------------------------------------------------------------------------


Maybe move this code to a 'static inline' routine in H5Tconv.h and invoke it from H5T__conv_f_f_loop and H5T__conv_complex_loop? I bet that the compiler will inline it correctly and eliminate any performance gap. Getting rid of the duplicated code would be very worthwhile from a maintenance perspective.

Sure, I can probably do this. I must admit that I was a bit concerned about potential function call overhead in such a hot loop, but this function also isn't likely to be used very often unless applications make use of non-standard complex number types.

qkoziol · 2024-07-10T17:15:07Z

src/H5Tconv_complex.c

+                    HGOTO_ERROR(H5E_DATATYPE, H5E_CANTCONVERT, FAIL, "unable to convert data values");
+            }
+            else {
+                /* If floating-point types are the same, use specialized loop */


Isn't this the same for floating-point type conversion? Could the code be shared with them?

qkoziol · 2024-07-10T18:09:23Z

src/H5Tconv_float.c

+        {
+            size_t type_size;
+
+            if (dst_p->shared->type == H5T_FLOAT)


This could be hoisted out of the loop

qkoziol · 2024-07-10T18:19:40Z

src/H5Tconv_float.c

-                        goto padding;
+
+                    if (except_ret == H5T_CONV_UNHANDLED)
+                        H5T__bit_set(d, (dst_atomic.offset + dst_atomic.prec - 1), (size_t)1, true);


We're using the H5T__bit_set, etc. routines so heavily now, I suggest moving them into one of the H5T headers and making them 'static inline', to speed them up.

qkoziol · 2024-07-10T18:39:17Z

src/H5Tconv.c

@@ -185,27 +177,47 @@ H5T_get_force_conv(const H5T_t *dt)
 *-------------------------------------------------------------------------
 */
 herr_t
-H5T__reverse_order(uint8_t *rev, uint8_t *s, size_t size, H5T_order_t order)
+H5T__reverse_order(uint8_t *rev, uint8_t *s, const H5T_t *dtype)


After looking at this again, the function name is very misleading. It should be "H5T__to_le_order" or similar. If you are on a little-endian machine, it definitely doesn't reverse the order.

Agreed and ideally we would be able to refactor the conversion functions to not have to byte-swap data into little-endian order in the first place, removing the need for reversing the buffer before calling the exception callback function.

src/H5Tpkg.h

tools/lib/h5tools.h

qkoziol · 2024-07-10T20:30:11Z

tools/test/h5ls/expected/tattr2.ls

-                   "b"                +4    IEEE 64-bit little-endian float
-               } 12 bytes
+                   "b"                +8    IEEE 64-bit little-endian float
+               } 16 bytes


Why did this change?

My guess is that the file must have previously been generated on some system where the compiler used different structure padding.

qkoziol

Amazing amount of great work here!

Four changes seem important to me:

Add the fuzzing guards on the complex datatype's encoding.
Add the version field to the complex datatype encoding, so that if we ever add heterogeneous complex types, we don't have to write as complex of a parser.
Switch H5Tset_size's behavior
Remove the 'homogeneous' flag from the complex datatype struct.

jhendersonHDF · 2024-07-10T20:56:21Z

* Add the fuzzing guards on the complex datatype's encoding.

The previous comment was made in the decode helper routine. Do we want checks both during encode and decode?

* Add the version field to the complex datatype encoding, so that if we ever add heterogeneous complex types, we don't have to write as complex of a parser.

What's the purpose of having a complex number-specific encoding version field here? Is it just to check that no heterogeneous complex types are allowed in, e.g. version 0?

qkoziol · 2024-07-10T21:11:50Z

* Add the fuzzing guards on the complex datatype's encoding.
The previous comment was made in the decode helper routine. Do we want checks both during encode and decode?

Sorry, I meant decoding.

* Add the version field to the complex datatype encoding, so that if we ever add heterogeneous complex types, we don't have to write as complex of a parser.
What's the purpose of having a complex number-specific encoding version field here? Is it just to check that no heterogeneous complex types are allowed in, e.g. version 0?

🤔 I worked through all the permutations I can think of again, and I think it'll be OK. I was concerned about the possible addition of another parent datatype for the heterogeneous case, but I think the existing flag will be OK for the forward/backward compatibility So, I think we can skip adding the version field.

jhendersonHDF · 2024-08-08T17:26:22Z

Most of the major previous comments should be addressed now. I haven't made the change to H5Tset_size yet as I still feel fairly strongly about that and I don't think we necessarily came to any real conclusion on it. There are various other comments on other possible optimizations or refactoring that could be made, but many of them seem like things that could fall outside this PR.

qkoziol · 2024-08-26T20:00:11Z

Most of the major previous comments should be addressed now. I haven't made the change to H5Tset_size yet as I still feel fairly strongly about that and I don't think we necessarily came to any real conclusion on it. There are various other comments on other possible optimizations or refactoring that could be made, but many of them seem like things that could fall outside this PR.

Perhaps we should have a short, small audience meeting to discuss the H5Tset_size() issue?

jhendersonHDF requested review from byrnHDF, derobins, brtnfld, epourmal, lrknox, fortnern, qkoziol, vchoi-hdfgroup, bmribler, glennsong09 and mattjala as code owners July 8, 2024 15:55

jhendersonHDF commented Jul 8, 2024

View reviewed changes

release_docs/RELEASE.txt Outdated Show resolved Hide resolved

jhendersonHDF commented Jul 8, 2024

View reviewed changes

qkoziol reviewed Jul 9, 2024

View reviewed changes

src/H5Tconv.c Show resolved Hide resolved

jhendersonHDF added 3 commits July 10, 2024 11:04

Rename standard complex types

fcedcd6

Add note to RELEASE.txt about conversion paths

77a0c95

Put back comment about legacy VAX files

ae4e075