Skip to content

CI Build Updates#5308

Merged
ethomson merged 15 commits into
masterfrom
ethomson/cifix
Nov 27, 2019
Merged

CI Build Updates#5308
ethomson merged 15 commits into
masterfrom
ethomson/cifix

Conversation

@ethomson
Copy link
Copy Markdown
Member

@ethomson ethomson commented Nov 23, 2019

Our CI builds are currently flaky. Some of these issues are related to the upgrade to xenial for our Linux build and test runs. Some of them snuck in because that move destabilized our CI and we didn't catch problems.

Here are some issues:

  1. valgrind is reporting errors that it should be suppressing. This is because our valgrind suppression is suppressing a number of errors in symbols in libssh2.so, but we're currently statically linking against libssh2. Thus the suppression file can't match those symbols and we see a number of valgrind errors.
  2. valgrind errors aren't failing the builds. The version of valgrind that we were using from the PPA is broken and --exit-errorcode is ignored. 😳
  3. OpenSSL has several new issues that need to be suppressed. It appears that this is an interaction between the versions of OpenSSL, gcc and valgrind that we have on the build/test image reporting a false positive for uninitialized memory. As best I can tell, valgrind misses some subtle memory initializations with some gcc optimizations. Amusingly, these are only noticed when talking to Azure Repos. Presumably the Windows TLS stack or the ciphers in use are sufficiently different for valgrind to notice these. (But only sometimes, sigh.)

Fixes:

  1. Build a shared library during the libssh2 compilation step so that we can target suppressions against symbols in libssh2.so.
  2. Build the latest version of valgrind instead of using the PPA, so that --exit-errorcode works again.
  3. Quiet valgrind down.
    i. Extend the suppressions for valgrind to include uninitialized usage stemming from OpenSSL.
    ii. Mark memory returned from OpenSSL as initialized explicitly.

In addition, there are a few build process enhancements:

  1. Provide a default CMake generator in the build script. It explicitly passes -G, so it needs a default to supply if none was given in the environment.
  2. Show the distribution information again.
  3. Break the dockerfile up into stages. This allows it to cache stages and re-use unchanged ones, to prevent (eg) long apt installs that don't actually change the dependencies. Now docker build will pick up again from the first modified stage, re-using the unchanged stages from the cache.
  4. Don't delete the apt cache. This aides in local debugging, so that you can apt-get install ... tools in the container. This bloats the image somewhat, but since we don't store it in a registry, this is of limited pain.
  5. Cache docker layers in Azure Pipelines. This prevents us from needing to rebuild the entire image on every build.

Provide a sane default for `CMAKE_GENERATOR` in the build script so that
it can be invoked without having to set that in the environment.
The lsb-release command is missing on our images; just show the
information from the file instead of relying on it.
libcrypto will read uninitialized memory as entropy.  Suppress warnings
from this behavior.
@ethomson
Copy link
Copy Markdown
Member Author

It gets better: the version of valgrind on the Ubuntu Xenial images has a broken --error-exitcode.

root@40e03a14632b:/tmp# valgrind -q --leak-check=full --show-reachable=yes --error-exitcode=2 --suppressions=/src/script/valgrind.supp  ./foo
==7019== 1,024 bytes in 1 blocks are definitely lost in loss record 1 of 1
==7019==    at 0x4C2DB4F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7019==    by 0x40059E: main (in /tmp/foo)
==7019==
root@40e03a14632b:/tmp# echo $?
0

Sigh.

@ethomson ethomson force-pushed the ethomson/cifix branch 7 times, most recently from 2c16b20 to 2ceeb9d Compare November 23, 2019 12:01
Deleting the apt cache can be helpful for reducing the size of a
container, but since we don't push it anywhere, it only hinders our
ability to debug problems while working on the container.  Keep it.
Use a multi-stage docker build so that we can cache early stages and not
need to download the apt-provided dependencies during every build (when
only later stages change).
The valgrind in the PPA is broken and ignores `--exit-errorcode`.
Build and install our own.
Our docker builds are getting expensive, let's cache some of this.
We currently talk to Azure Repos for executing an online test
(online::clone::path_whitespace).  Add a simpler test to talk to Azure
Repos to make it obvious that strange test failures are not likely the
whitespace in the path, but actually a function of talking to Azure
Repos itself.
valgrind will warn that OpenSSL will use undefined data in connect/read
when talking to certain other TLS stacks.  Thankfully, this only seems
to occur when gcc is the compiler, so hopefully valgrind is just
misunderstanding an optimization.  Regardless, suppress this warning.
Provide usage hints to valgrind.  We trust the data coming back from
OpenSSL to have been properly initialized.  (And if it has not, it's an
OpenSSL bug, not a libgit2 bug.)

We previously took the `VALGRIND` option to CMake as a hint to disable
mmap.  Remove that; it's broken.  Now use it to pass on the `VALGRIND`
definition so that sources can provide valgrind hints.
@ethomson
Copy link
Copy Markdown
Member Author

Merging this so that CI is reliable again.

@ethomson ethomson merged commit 7805122 into master Nov 27, 2019
Copy link
Copy Markdown
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for these changes, they all look good to me. I'm always baffled by the amount of broken stuff we're dealing with when using Ubuntu. Broken libssh2 as it does produce memory leaks and other stuff by default, non-working valgrind options, a whole lot of suppressions...
Anyway, thanks for taking care of these issues!

path: /tmp/dockercache
- script: |
if [ -f /tmp/dockercache/${{parameters.docker.image}}.tar ]; then docker load < /tmp/dockercache/${{parameters.docker.image}}.tar; fi
displayName: 'Load Docker cache'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🎉 🎉 Great change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants