Eliminate MD5 usage by adopting Project wide SHA-512 checksums#951
Eliminate MD5 usage by adopting Project wide SHA-512 checksums#951in-manishkr wants to merge 1 commit into
Conversation
c029d90 to
1d116be
Compare
There was a problem hiding this comment.
Excellent, thanks. I did not see problems with the code and I am approving it.
WARNING I am afraid that other products depending on Feilong will need to be updated due to the API change. I am thinking in particular at the Go Connector for Feilong and the Terraform provider for Feilong that I am maintaining, but the ICIC guys are probably hit too for the openstack code that uses Feilong.
PLEASE When this is merged, close issue #888
I agree with you, due to which I have included a python script database_migration_md5_to_sha256.py which can be used to re-evaluate the checksums for all existing images and updates the checksum column with newly calculated values. |
Thank you for that database upgrade script, Manish. But this is not only about database contents. Your PR is also a breaking change for the API, which means that every program that uses the Feilong API will have to be rewritten. This could be mitigated though by accepting both the old parameter name ( |
Yes, thats a good suggestion, i will incorporate these changes asap for backward compatibility. |
1d116be to
f99b689
Compare
|
I have incorporated the requested changes as a fallback mechanism:
please re-review |
Thanks. It might be you also need to accept both values in the validation code (on top of my head: |
@Bischoff Just wanted to know, if instead of using SHA256, can we use SHA512 as the hashing algorithm ?
|
da5677d to
c142f6a
Compare
|
@Bischoff I've updated the default algorithm to SHA512 across the entire project, and updated parameter_types.py as requested to accept md5sum as input for backward compatibility. A couple of small enhancements: Glance supports Let me know your thoughts on this !! |
c142f6a to
c34802d
Compare
Replace MD5-based checksum generation with SHA-512 to address known MD5 collision vulnerabilities and improve image integrity guarantees. SHA-512 produces a 128-character hex digest, significantly stronger than MD5's 32-character output. - smtclient: rename _get_md5sum() -> _get_checksum(), switch hashlib call to sha512(); update all callers and internal variable names - database: rename md5sum column to checksum in image table DDL, SQL INSERT statements, and image_keys_list mapping - parameter_types: replace md5sum field with checksum (128-char SHA-512 pattern); retain md5sum as fallback key in import comparison for backward compatibility with older callers - returncode: update error messages rs=3/4 to reference checksum - file.py: switch file_import to sha512, rename md5sum return key to checksum - api.py: update image_import/image_export docstrings - docs: update parameters.yaml, restapi.rst, makeimage.rst - tests: update all _get_md5sum mock patches to _get_checksum, replace 32-char MD5 sample hashes with 128-char SHA-512 values - add database_migration_md5_to_sha512.py to recalculate checksums for existing images already stored in sdk_image.sqlite Signed-off-by: Manish Kumar <Manish.Kumar176@ibm.com>
c34802d to
decb185
Compare
|
@Bischoff Could you please re-review this PR |
There was a problem hiding this comment.
@Bischoff Just wanted to know, if instead of using SHA256, can we use SHA512 as the hashing algorithm ?
-> I do not see any problem with that.
Did you do a test that an old API call with a md5sum parameter still kind of works? 🤔
Apart from that, your code looks okay, re-approving.
Don't forget to close issue #888 and PR #931 when merging please
| 'os_hash_algo': {'type': 'string'}, | ||
| 'os_hash_value': {'type': 'string', 'pattern': '^[0-9a-fA-F]+$'}, | ||
| # md5sum kept for backward compatibility; 32 hexadecimal characters | ||
| 'md5sum': {'type': 'string', 'pattern': '^[0-9a-fA-F]{32}$'}, |
There was a problem hiding this comment.
"md5sum" is still accepted, good (compatibility).
There was a problem hiding this comment.
We currently accept md5sum as an input parameter, but checksum validation will fail if the provided value is actually an MD5 checksum. This is because we always calculate a SHA512 checksum and compare it against the value supplied via the parameter.
I would recommend removing support for md5sum entirely and skipping image integrity validation when only an MD5 checksum is provided.
Otherwise, the implementation feels like a half-baked solution, as it accepts an md5sum parameter without actually supporting MD5-based validation.
Alternatively, the end user could provide os_hash_value and os_hash_algo as input parameters to determine which checksum algorithm is used to validate image integrity during image import into zvmsdk.
Regardless of the algorithm used for validation, the checksum persisted in the database will always be stored as a SHA512 checksum.
There was a problem hiding this comment.
I have tested the changes will all the keys:
os_hash_algo and os_hash_value
md5sum
checksum
There was a problem hiding this comment.
Hi @in-manishkr,
thanks for your efforts.
I do not think it makes sense to keep md5sum parameter if Feilong does not accept MD5 sums for real. It would be better to cleanly remove all validation, support, tests, and related documentation in that case, apart from a note that it has been deprecated.
Still, I would prefer old code not to break, and MD5 sums to be accepted for real, maybe issuing a security warning, to avoid breaking old applications.
What are os_hash_algo and os_hash_value ? Aren't they adding complexity ?
@Rajat-0 , any opinion?
Summary
Replace MD5-based checksum generation with SHA-512 and standardize MD5-specific naming to generic checksum terminology throughout the codebase.
MD5 is no longer considered secure due to known collision vulnerabilities. This change improves file integrity verification by adopting SHA-512 and aligns the codebase with modern security practices.
In addition, a database migration script has been added to migrate existing image metadata and recalculate checksums for restored databases.
Changes
Core implementation
hashlib.md5()withhashlib.sha512()_get_md5sum()to_get_checksum()Database
md5sumtochecksumAPI and validation
md5sumtochecksumTests
Documentation
Migration tooling
feilong/database_migration_md5_to_sha512.pySecurity Impact
This change eliminates the use of MD5 for checksum generation and verification. SHA-512 provides significantly stronger collision resistance and better aligns with current security recommendations and compliance requirements.
Benefits include:
Compatibility Notes
Breaking Change
md5sumhas been renamed tochecksummd5sumhas been renamed tochecksumDatabase Migration Requirement
For deployments restoring or upgrading an existing
sdk_image.sqlitedatabase, the following migration script must be executed after the database is restored:The migration script:
md5sumtochecksumFailure to run the migration script after database restoration may result in schema mismatches or invalid checksum data.
Upgrade Considerations
Operators upgrading existing environments should:
sdk_image.sqlitedatabase if required.feilong/database_migration_md5_to_sha512.py.checksumfield and SHA-512 values.Testing