It all begins with the problem, that the terminal window in IntelliJ doesn’t work anymore. This is the story behind what I learned about that and how. Including some workarounds until the bugs are fixed upstream.

The problem

The problem is a bit hard to describe. It started at some time beginning June 2024. The symptom essentially is that the terminal window in IntelliJ IDEA isn’t working. If you open a new terminal, it doesn’t show anything. It stays empty. Since I used the terminal very intensively, it’s really annoying to always switch between the IntelliJ window and another separate terminal window (as opposed to just press Alt+F12).

If you open another terminal tab, then this is the same: Now you have to empty terminal windows, where the prompt doesn’t appear. The terminal functionality is simply broken and unusable.

Clues

High cpu usage

If you start IntelliJ IDEA as usual and open a local terminal, then nothing happens. Just a cursor on the top left will be shown, but no prompt. At some point, you’ll notice, that the CPU usage is quite high. If you look e.g. with htop and the process, you’ll notice that an IntelliJ IDEA process uses about 100% CPU. That means, it consumes one CPU core completely. If you configure in htop to show the column “COMM” (which is the name from /proc/<pid>/comm), you’ll notice PtyProcess Reap. First clue. Pty - or pseudo teletype or terminal - could be related to the terminal problem. We are on to something.

One common approach to look into Linux process, what they are actually doing is: use strace to see which syscalls they do. Maybe that gives some more info:

$ cat /proc/3371751/comm
PtyProcess Reap
$ strace -p 3371751
strace: Process 3371751 attached
close(522341805)                        = -1 EBADF (Bad file descriptor)
close(522341806)                        = -1 EBADF (Bad file descriptor)
close(522341807)                        = -1 EBADF (Bad file descriptor)
close(522341808)                        = -1 EBADF (Bad file descriptor)
close(522341809)                        = -1 EBADF (Bad file descriptor)
close(522341810)                        = -1 EBADF (Bad file descriptor)
close(522341811)                        = -1 EBADF (Bad file descriptor)
close(522341812)                        = -1 EBADF (Bad file descriptor)
close(522341813)                        = -1 EBADF (Bad file descriptor)
close(522341814)                        = -1 EBADF (Bad file descriptor)
...
^Cstrace: Process 3371751 detached

(exit strace with Ctrl+C).

So, what we can see here: This process is closing file descriptors, and a lot of them. But they all seem to be invalid, as the result is always “EBADF”… This doesn’t look like this process works in proper conditions. Looks like useless work.

By the way: If I let this terminal window sit there for a while, eventually the terminal will show up. So, it just takes a really long time (talking about minutes) to open a new terminal window. For every new (local) terminal window in IntelliJ IDEA, it takes again the long time.

Another thing to look at - since the problem appears in IntelliJ IDEA: The log file of that program, which can be found via menu “Help > Show Log in Files”. Or on the command line for me it is: less ~/.cache/JetBrains/IdeaIC2024.1/log/idea.log.

There is only one line related to the terminal:

2024-06-21 09:02:09,283 [  24305]   INFO - #o.j.p.t.LocalTerminalDirectRunner - Started com.pty4j.unix.UnixPtyProcess in 134 ms from [/bin/bash, --rcfile, /home/andreas/.local/share/JetBrains/Toolbox/apps/intellij-idea-community-edition/plugins/terminal/shell-integrations/bash/bash-integration.bash, -i] in /home/andreas/PMD/source/pmd, [columns=228, rows=15], diff_envs={TERM=xterm-256color, TERMINAL_EMULATOR=JetBrains-JediTerm, TERM_SESSION_ID=3018e2f6-d542-429d-95df-61d9f6d4cab5, __INTELLIJ_COMMAND_HISTFILE__=/home/andreas/.cache/JetBrains/IdeaIC2024.1/terminal/history/pmd-main-history1}

But there seems to be no other log output. But at least, we see again something with “pty”, this time com.pty4j.unix.UnixPtyProcess. Maybe that’s something to do with the “PtyProcess Reap”?

This information alone doesn’t help. Time for some internet searches, e.g. for the keyword “PtyProcess Reap”.

You find some information about what a pty is at all on the library for python: ptyprocess. Apparently, this is the way if you want to execute a command in “interactive” mode, within an active terminal. There is also an in-depth article linked about The TTY demystified which gives detailed background information about a terminal and what pseudo terminal today does.

If we search on, we eventually find the github project pty4j, which seems to be the thing, that IntelliJ IDEA is using. Since this is in the JetBrains organization, this is a strong indication, we are looking at the right thing.

Searching the issues there (maybe someone else already reported such a behavior), we find the following issue: 100% CPU when the _SC_OPEN_MAX is a big number #147. Sounds similar - 100% CPU usage. It also contains the observation we made earlier with strace: “… was trying to close a huge range of FDs”.

After seeing the suggested fix in faster all-fd close in child #124

  • which incidentally was hanging around since 2022 - I got another clue, what to look for: the upper range of of the loop is _SC_OPEN_MAX: This is described e.g. in sysconf(3):

OPEN_MAX - _SC_OPEN_MAX The maximum number of files that a process can have open at any time. Must not be less than _POSIX_OPEN_MAX (20).

So it seems, the process “PtyProcess Reap” tries to close all possible files that it could possible have opened for some reason. And this number is potentially higher now?

There is also a workaround suggested, to lower this number: it is setting the limits for the process. There is also a related issue in a different project linked ([Bug]: 100% CPU usage when the open FD limit is huge #14177) which has a similar problem and more info, to figure out the current limits. And interestingly, we’ll find the very same number (1073741816).

First step is, to figure out the current limits. In a terminal, for me this is the value:

$ ulimit -Sa|grep open
open files                          (-n) 1024
$ ulimit -Ha|grep open
open files                          (-n) 1073741816

There is a soft limit, which is at 1024 for me, and a hard limit, which is at a very high number: 1_073_741_816 (actually, it is almost 2^30).

These are the limits in a terminal window only. What are the limits for a running process? You can query the “proc” filesystem, if you now the PID. For my IntelliJ IDEA process, it looks like this:

$ cat /proc/3370401/limits | grep files
Max open files            1073741816           1073741816           files

This process uses the very big number for both the soft and the hard limit.

The workaround suggested to lower these numbers. There is a utility, to change these limits for a running process, it is prlimit. You can also query the current values for a process:

$ prlimit --nofile -p 3370401
RESOURCE DESCRIPTION                    SOFT       HARD UNITS
NOFILE   max number of open files 1073741816 1073741816 files

This is consistent with what the “proc” filesystem shows. With prlimit, you can also set the limit, e.g.

$ prlimit --nofile=1024:1073741816 -p 3370401
$ prlimit --nofile -p 3370401
RESOURCE DESCRIPTION              SOFT       HARD UNITS
NOFILE   max number of open files 1024 1073741816 files

Now, we have a soft limit of arbitrary 1024 set for this single process. Now, opening a new terminal window in IntelliJ IDEA works as before. Workaround works.

To summarize the part until now:

  • IntelliJ IDEA uses pty4j to implement the terminal window functionality
  • This library tries to close all possible file descriptors in a simple for loop
  • For some unknown reason, that upper limit of possible file descriptors of the process is very high
  • Workaround is to make this number small again, e.g. prlimit --nofile=1024:1073741816 -p <PID>

Unfortunately, I only remember that the terminal didn’t work after some Debian updates… So, it’s time to dig into, what could have caused this change.

Debian Update

I’m using Debian Testing, so there are always some updates. You can read the log, which packages have been updated: /var/log/dpkg.log. Since I read already, that the limits for a process can be configured system-wide via /etc/limits.conf or for a system that uses systemd via a configuration option somewhere called LimitNOFILE, I knew, I had to look out for systemd updates.

And sure enough, there was some update:

$ grep "upgrade systemd:amd64" /var/log/dpkg.log
2024-06-01 19:43:13 upgrade systemd:amd64 255.5-1 256~rc3-2
2024-06-08 22:17:15 upgrade systemd:amd64 256~rc3-2 256~rc3-7
2024-06-16 20:51:55 upgrade systemd:amd64 256~rc3-7 256-1

I’m not sure anymore, when exactly it started to fail - maybe since 2024-06-08? Definitely it failed since 2024-06-16. But I think, I hoped, that the issue will be fixed by this update.

So, something changed between 256~rc3-2 and 256~rc3-7. Let’s have a look at the changelog at Details of package systemd in trixie. On the right, there is a link to Debian Changelog for this package - note: this link will change when the next version is published, but it contains the changes of all previous packages.

So, let’s look at the entries for “256~rc3-7” if there is something with “limit”:

...
systemd (256~rc3-3) unstable; urgency=medium
...
* Restore open files limit bump on boot. Broken packages ought to have
    been fixed by now. (Closes: #1029152)

This sounds promising and I think, this is the relevant change that caused the terminal problem to appear. It references debian bug #1029152. It has been fixed with commit 99066f93, which changes a build option how systemd is built for debian: It removes the build option -Dbump-proc-sys-fs-nr-open=false.

Time to read up, what this does and what this is all about in systemd. Search in the systemd repository on github, I found the following issues:

  • bump RLIMIT_NOFILE #10244: This was done already in 2018, so this didn’t change with the Debian update. But it gives some pointers and explanations. This seems the PR, that introduced the build option, that Debian had disabled (“bump-proc-sys-fs-nr-open”). In the changelog of systemd (the file NEWS) this is described in more detail. They set it back then to 256k, but only the hard limit. And some pointers to an issue with select() syscall.
  • meson: let’s bump RLIMIT_NOFILE hard limit to 512K #10780: A month later this limit was raised to 512k.
  • The current default limit is defined in meson.build#L102 and it is still at 512k. Note: This is lower than 1073741816 …
  • Why is soft LimitNOFILE 1024 by default? #25478 - this is an interesting question which gives a lot of background information. Especially the link to File Descriptor Limits is worthwhile to read. It explains the difference between soft and hard limit, and why you want to keep the soft limit at 1024 and only raise the hard limit.
  • systemd is basically the init process and it will fork all the other process in the system. The limits will be passed on unchanged (if not configured) or according to the configuration (see systemd-system.conf).

After I knew, what to look for - the limits of the terminal process I’m opening - I could easily retest this with a fresh install of Debian:

  • Debian Stable 12.5 has the following numbers:
    • gnome-terminal with bash: prlimit --nofile -p $PPID: RESOURCE DESCRIPTION SOFT HARD UNITS NOFILE max number of open files 1048576 1048576 files
    • cat /proc/sys/fs/nr_open: 1048576
    • cat /proc/sys/fs/file-max: 9223372036854775807
    • systemctl --version: 252
  • Debian Testing (I did a apt full-upgrade, see DebianTesting):
    • gnome-terminal with bash: prlimit --nofile -p $PPID: RESOURCE DESCRIPTION SOFT HARD UNITS NOFILE max number of open files 1073741816 1073741816 files
    • cat /proc/sys/fs/nr_open: 1073741816
    • cat /proc/sys/fs/file-max: 9223372036854775807
    • systemctl --version: 256

With the upgrade, the soft and hard limits changed to the maximum number of available descriptors. Interesting to see is, that the soft limit is the same as the hard limit (independent of the systemd update), so somewhere the processes itself probably increase their soft limit to their hard limit.

Another observation: The number was never 512k, as documented by systemd to be the default value… so, maybe that’s actually a bug in systemd?

At least we know now, what changed and how this affected IntelliJ IDEA. By the way, there are other software products, that are also affected, e.g. Cups.

Only IntelliJ IDEA?

The IntelliJ bug tracker also has some infos about this problem. Some of these bugs where already very old and closed, but got new comments:

But is it only IntelliJ IDEA, that is affected by this Debian change?

It turns out, that the built-in terminal in Eclipse has the same problem. There the process is called “Spawner Reaper”. strace shows the same behavior (closing lots of file descriptors). It appears that the terminal support in Eclipse (the feature org.eclipse.tm.terminal.feature) is part of Eclipse CDT™ (C/C++ Development Tooling). It turns out, there is the same problem in org.eclipse.cdt.core.native/native_src/unix/exec_pty.c#L109-L117. The familiar for-loop up to sysconf(_SC_OPEN_MAX) that closes all the file descriptors.

Interestingly Visual Studio Code doesn’t have this problem. This IDE also doesn’t use Java. Maybe the issue surfaces only for Java programs?

There is one additional pointer: When systemd increased the file handles, there was some problem with Java: JDK-8150460. Apparently back in time until 2020, Java allocated memory for every possible file descriptor, which easily caused a memory issue when… But that seems to be fixed now. It seems however, that the java process also uses the hard limit and increases the soft limit up to the hard limit.

What about Ubuntu 24.04 (noble)? Let’s see:

$ prlimit --nofile -p $PPID
RESOURCE DESCRIPTION                 SOFT    HARD UNITS
NOFILE   max number of open files 1048576 1048576 files
$ systemctl --version
systemd 255 (255.4-1ubuntu8)
$ cat /proc/sys/fs/nr_open
1048576

So, for now it doesn’t seem to be a problem. But why? It seems, the bump-nr-open option is also disabled for ubuntu’s systemd package. Yes, the ubuntu package is based on the debian package and it uses an older version. At the moment, the build option is still disabled (see debian/rules#82). But this is only a matter of time, until this Debian change lands in Ubuntu as well.

What is it about these pty libraries?

One though about these libraries, what they do: They use the fork() system call to create a new child process. This duplicates the calling process and means, that all opened files are also duplicated (the file handles):

The child inherits copies of the parent’s set of open file descriptors.

So, what you usually do after forking and before exec() a new program is to cleanup all the inherited resources that you don’t need. The new program for the terminal is usually a shell (e.g. bash) and is completely separate from the parent process, which is the Java IDE Eclipse or IntelliJ IDEA. These IDEs might have many files opened: all the jar-files of the dependencies to load classes from and additionally the jar-files of user installed plugins and last not least the files that are opened in a editor window.

I guess, that the IDE (the Java program) has not a complete overview of all the opened files, since some parts are out of control (e.g. plugins). The JVM would know, but running the shell for the terminal through java.lang.ProcessBuilder can start the bash, but without the PTY semantics. Which means, the bash wouldn’t believe that it is run interactively and behave differently. Theoretically, the ProcessBuilder implementation needs to do very similar things (fork, cleanup, exec). And indeed, they also close the file descriptors in childproc.c#L70. But they only close the open fds and not every possible fd. The same has now been implemented in pty4j as a fallback in case the new systemcall to close a range of fds is not available (close_range), but the fallback is not so portable (only works if the OS provides a /proc/self/fd directory). The last fallback is the old way, so could potentially take a long time.

Looking at the two libraries pty4j and eclipse-cdt, the two C-files look very similar. Have a look at the copyright notice at the beginning. It seems, that these two originated from the same source…

pty4j explains, that it used the code from a library called ELT (Eclipse Local Terminal), which was a plugin for eclipse to provide a local terminal window. ELT’s exec_pty.c is available in one of the repositories, that exported the archived google code ELT repo to GitHub.

Possible workarounds

So, until the closing fds issues are fixed in all pty libraries, there are two simple workarounds:

As mentioned above, you can set the limit for a running process with:

prlimit --nofile=1024:1073741816 -p $(pidof $HOME/.local/share/JetBrains/Toolbox/apps/intellij-idea-community-edition/jbr/bin/java)

This workaround however is only temporary - with the next restart of the application, you need to set it again.

Another way is to configured systemd to set lower limits for the processes that are started. For me, it worked by configuring /etc/systemd/user.conf and adding the line:

DefaultLimitNOFILE=1024:524288

Setting it to /etc/systemd/system.conf had no effect, but in /etc/systemd/user.conf seems to work.

Summary

As seen here, bugs can hide a long time. And one bug (e.g. in pty4j) only surfaced, once another bug (e.g. in Debian) was fixed. That’s the nature of complex systems, that there are many parts the play together and influence each other. It could also be, that two bugs cancel out. And it’s hard to predict the consequences of one change on other parts.

And as always: You learn most about a system, when it fails. If the system works, you have no incentive to understand how the system works - e.g. I wouldn’t have looked under the hood. And that’s the way how you remove the magic by learning how the stuff really works.

Update

  • Filed issue #835 for eclipse-cdt as well.

Update 2024-07-05