Skip to content

Eliminate MD5 usage by adopting Project wide SHA-512 checksums#951

Open
in-manishkr wants to merge 1 commit into
openmainframeproject:masterfrom
in-manishkr:enhance_checksum_algo
Open

Eliminate MD5 usage by adopting Project wide SHA-512 checksums#951
in-manishkr wants to merge 1 commit into
openmainframeproject:masterfrom
in-manishkr:enhance_checksum_algo

Conversation

@in-manishkr

@in-manishkr in-manishkr commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Replace MD5-based checksum generation with SHA-512 and standardize MD5-specific naming to generic checksum terminology throughout the codebase.

MD5 is no longer considered secure due to known collision vulnerabilities. This change improves file integrity verification by adopting SHA-512 and aligns the codebase with modern security practices.

In addition, a database migration script has been added to migrate existing image metadata and recalculate checksums for restored databases.

Changes

Core implementation

  • Replace hashlib.md5() with hashlib.sha512()
  • Rename _get_md5sum() to _get_checksum()
  • Update image import, capture, and file upload workflows to use SHA-512 checksums
  • Replace MD5-specific references with generic checksum naming

Database

  • Rename image table column from md5sum to checksum
  • Update database APIs, queries, and mappings to use the new column name
  • Update image record creation and retrieval logic to use checksum values

API and validation

  • Rename API parameter and response fields from md5sum to checksum
  • Update checksum validation from 32-character MD5 hashes to 128-character SHA-512 hashes
  • Update API documentation and examples accordingly

Tests

  • Update unit tests to use SHA-512 checksum values
  • Rename MD5-specific test methods, mocks, and assertions
  • Update validation and API handler test coverage
  • Update database and SMT client test cases

Documentation

  • Replace MD5-specific references with generic checksum terminology
  • Update REST API documentation, parameter definitions, and examples
  • Update image import/export documentation and sample payloads

Migration tooling

  • Add feilong/database_migration_md5_to_sha512.py
  • Provide support for migrating existing image metadata from MD5 to SHA-512
  • Recalculate checksums for existing image records during migration

Security Impact

This change eliminates the use of MD5 for checksum generation and verification. SHA-512 provides significantly stronger collision resistance and better aligns with current security recommendations and compliance requirements.

Benefits include:

  • Improved protection against hash collision attacks
  • Stronger file integrity verification
  • Alignment with modern cryptographic best practices
  • Removal of dependency on a deprecated hashing algorithm

Compatibility Notes

Breaking Change

  • API field md5sum has been renamed to checksum
  • Database column md5sum has been renamed to checksum
  • SHA-512 checksums are now expected (128 hexadecimal characters)
  • Existing integrations that submit or consume MD5 values must be updated
  • Existing databases require migration before use with this change

Database Migration Requirement

For deployments restoring or upgrading an existing sdk_image.sqlite database, the following migration script must be executed after the database is restored:

python feilong/database_migration_md5_to_sha512.py

The migration script:

  • Renames the database column from md5sum to checksum
  • Recalculates SHA-512 checksums for all existing image records
  • Updates image metadata to match the new checksum format
  • Ensures compatibility with the updated SDK schema

Failure to run the migration script after database restoration may result in schema mismatches or invalid checksum data.

Upgrade Considerations

Operators upgrading existing environments should:

  1. Back up the existing database.
  2. Restore the sdk_image.sqlite database if required.
  3. Run feilong/database_migration_md5_to_sha512.py.
  4. Verify image records contain valid SHA-512 checksums.
  5. Upgrade API clients to use the checksum field and SHA-512 values.
  6. Validate image import/export workflows after upgrade.

Testing

  • Updated unit tests for checksum generation and validation
  • Verified image import workflows with SHA-512 checksums
  • Verified file upload checksum generation
  • Verified API validation accepts SHA-512 and rejects legacy MD5 values
  • Updated database-related test coverage
  • Updated documentation examples and API references

@in-manishkr in-manishkr force-pushed the enhance_checksum_algo branch 3 times, most recently from c029d90 to 1d116be Compare June 11, 2026 18:00
@Rajat-0 Rajat-0 requested a review from Bischoff June 12, 2026 05:28

@Bischoff Bischoff left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks. I did not see problems with the code and I am approving it.

WARNING I am afraid that other products depending on Feilong will need to be updated due to the API change. I am thinking in particular at the Go Connector for Feilong and the Terraform provider for Feilong that I am maintaining, but the ICIC guys are probably hit too for the openstack code that uses Feilong.

PLEASE When this is merged, close issue #888

@in-manishkr

Copy link
Copy Markdown
Contributor Author

Excellent, thanks. I did not see problems with the code and I am approving it.

WARNING I am afraid that other products depending on Feilong will need to be updated due to the API change. I am thinking in particular at the Go Connector for Feilong and the Terraform provider for Feilong that I am maintaining, but the ICIC guys are probably hit too for the openstack code that uses Feilong.

PLEASE When this is merged, close issue #888

I agree with you, due to which I have included a python script database_migration_md5_to_sha256.py which can be used to re-evaluate the checksums for all existing images and updates the checksum column with newly calculated values.

@Bischoff

Copy link
Copy Markdown
Contributor

I agree with you, due to which I have included a python script database_migration_md5_to_sha256.py which can be used to re-evaluate the checksums for all existing images and updates the checksum column with newly calculated values.

Thank you for that database upgrade script, Manish.

But this is not only about database contents. Your PR is also a breaking change for the API, which means that every program that uses the Feilong API will have to be rewritten.

This could be mitigated though by accepting both the old parameter name (image_md5sum) and the new parameter name (image_checksum) when parsing an API call.

@in-manishkr

Copy link
Copy Markdown
Contributor Author

I agree with you, due to which I have included a python script database_migration_md5_to_sha256.py which can be used to re-evaluate the checksums for all existing images and updates the checksum column with newly calculated values.

Thank you for that database upgrade script, Manish.

But this is not only about database contents. Your PR is also a breaking change for the API, which means that every program that uses the Feilong API will have to be rewritten.

This could be mitigated though by accepting both the old parameter name (image_md5sum) and the new parameter name (image_checksum) when parsing an API call.

Yes, thats a good suggestion, i will incorporate these changes asap for backward compatibility.

@in-manishkr in-manishkr force-pushed the enhance_checksum_algo branch from 1d116be to f99b689 Compare June 16, 2026 07:57
@in-manishkr

Copy link
Copy Markdown
Contributor Author

@Bischoff

I have incorporated the requested changes as a fallback mechanism:

expect_checksum = image_meta.get('checksum', image_meta.get('md5sum'))

please re-review

@Bischoff

Bischoff commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@Bischoff

I have incorporated the requested changes as a fallback mechanism:

expect_checksum = image_meta.get('checksum', image_meta.get('md5sum'))

please re-review

Thanks. It might be you also need to accept both values in the validation code

(on top of my head: zvmsdk/sdkwsgi/validation/parameter_types.py‎, but there might be other places as well).

@in-manishkr

in-manishkr commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@Bischoff
I have incorporated the requested changes as a fallback mechanism:
expect_checksum = image_meta.get('checksum', image_meta.get('md5sum'))
please re-review

Thanks. It might be you also need to accept both values in the validation code

(on top of my head: zvmsdk/sdkwsgi/validation/parameter_types.py‎, but there might be other places as well).

@Bischoff Just wanted to know, if instead of using SHA256, can we use SHA512 as the hashing algorithm ?

checksum column of sdk_image.sqlite already supports 512 chars, so incorporating SHA512 (128 chars) will not have any problem

@in-manishkr in-manishkr force-pushed the enhance_checksum_algo branch 2 times, most recently from da5677d to c142f6a Compare June 18, 2026 16:25
@in-manishkr in-manishkr changed the title Eliminate MD5 usage by adopting Project wide SHA-256 checksums Eliminate MD5 usage by adopting Project wide SHA-512 checksums Jun 18, 2026
@in-manishkr

Copy link
Copy Markdown
Contributor Author

@Bischoff I've updated the default algorithm to SHA512 across the entire project, and updated parameter_types.py as requested to accept md5sum as input for backward compatibility.

A couple of small enhancements:

Glance supports os_hash_algo and os_hash_value based on the algorithm configured when creating an image. We now accept these two parameters optionally — when provided, we calculate the image hash using the algorithm specified by the caller and compare it against the received os_hash_value. If os_hash_algo and os_hash_value aren't passed, we fall back to the checksum and md5sum keys to verify image integrity. In either case, we still store the SHA512 value in the zvmsdk image database.

Let me know your thoughts on this !!

@in-manishkr in-manishkr requested a review from Bischoff June 18, 2026 16:33
@in-manishkr in-manishkr force-pushed the enhance_checksum_algo branch from c142f6a to c34802d Compare June 22, 2026 09:03
Replace MD5-based checksum generation with SHA-512 to address known
MD5 collision vulnerabilities and improve image integrity guarantees.
SHA-512 produces a 128-character hex digest, significantly stronger
than MD5's 32-character output.

- smtclient: rename _get_md5sum() -> _get_checksum(), switch hashlib
  call to sha512(); update all callers and internal variable names
- database: rename md5sum column to checksum in image table DDL, SQL
  INSERT statements, and image_keys_list mapping
- parameter_types: replace md5sum field with checksum (128-char
  SHA-512 pattern); retain md5sum as fallback key in import
  comparison for backward compatibility with older callers
- returncode: update error messages rs=3/4 to reference checksum
- file.py: switch file_import to sha512, rename md5sum return key
  to checksum
- api.py: update image_import/image_export docstrings
- docs: update parameters.yaml, restapi.rst, makeimage.rst
- tests: update all _get_md5sum mock patches to _get_checksum,
  replace 32-char MD5 sample hashes with 128-char SHA-512 values
- add database_migration_md5_to_sha512.py to recalculate checksums
  for existing images already stored in sdk_image.sqlite

Signed-off-by: Manish Kumar <Manish.Kumar176@ibm.com>
@in-manishkr in-manishkr force-pushed the enhance_checksum_algo branch from c34802d to decb185 Compare June 22, 2026 09:07
@in-manishkr

Copy link
Copy Markdown
Contributor Author

@Bischoff Could you please re-review this PR

@Bischoff Bischoff left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bischoff Just wanted to know, if instead of using SHA256, can we use SHA512 as the hashing algorithm ?

-> I do not see any problem with that.

Did you do a test that an old API call with a md5sum parameter still kind of works? 🤔

Apart from that, your code looks okay, re-approving.

Don't forget to close issue #888 and PR #931 when merging please

'os_hash_algo': {'type': 'string'},
'os_hash_value': {'type': 'string', 'pattern': '^[0-9a-fA-F]+$'},
# md5sum kept for backward compatibility; 32 hexadecimal characters
'md5sum': {'type': 'string', 'pattern': '^[0-9a-fA-F]{32}$'},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"md5sum" is still accepted, good (compatibility).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bischoff

We currently accept md5sum as an input parameter, but checksum validation will fail if the provided value is actually an MD5 checksum. This is because we always calculate a SHA512 checksum and compare it against the value supplied via the parameter.

I would recommend removing support for md5sum entirely and skipping image integrity validation when only an MD5 checksum is provided.

Otherwise, the implementation feels like a half-baked solution, as it accepts an md5sum parameter without actually supporting MD5-based validation.

Alternatively, the end user could provide os_hash_value and os_hash_algo as input parameters to determine which checksum algorithm is used to validate image integrity during image import into zvmsdk.

Regardless of the algorithm used for validation, the checksum persisted in the database will always be stored as a SHA512 checksum.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested the changes will all the keys:

os_hash_algo and os_hash_value
md5sum
checksum

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @in-manishkr,

thanks for your efforts.

I do not think it makes sense to keep md5sum parameter if Feilong does not accept MD5 sums for real. It would be better to cleanly remove all validation, support, tests, and related documentation in that case, apart from a note that it has been deprecated.

Still, I would prefer old code not to break, and MD5 sums to be accepted for real, maybe issuing a security warning, to avoid breaking old applications.

What are os_hash_algo and os_hash_value ? Aren't they adding complexity ?

@Rajat-0 , any opinion?

@Rajat-0 Rajat-0 self-requested a review June 29, 2026 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants