The performance of NAMD is highly dependent on the underlying system hardware and the settings that you had used during the compilation process. However, here are a few things that you can check:
Try using +isomalloc_sync in your script along with +idlepoll.
Ensure that you have the correct MPI bindings, which means that you might have to recompile the binary.
Check if you are having full utilization of the available GPU and CPU cores.
Most important, please check if you are using reasonable values for your time step and write out frequencies.