CUDA Programming on Hybrid GPU Machine

Hello all. I am currently involving myself in creating some CUDA programs for a personal project, but experiencing trouble in out of the gate. I am able to successfully compile a Hello World program, that should print the HelloWorld string from the Nvidia GPU. However I am not getting that result.

Here is the CUDA C program, though should be very simple.

#include <stdio.h>

__global__ void hello() {
    printf("Hello World\n");
int main() {
    hello <<<1,1>>> ();
    return 0;

Then I compile with: nvcc
No problems there.

When I run the executable with ./a.out there's no output.
I notice that my Nvidia card is not on, and use primusrun ./a.out to activate and run on the dGPU. But still get no output.

My guess is that this has to do with my dual-GPU configuration, an area that I'm not so familiar with. So that's probably the source of my bug.

My system has Manjaro Gnome minimal, with bumblebee and bbswitch (though I'm unsure I set them up properly). It has an Intel integrated graphics and a GTX 960M

Any help would be appreciated

After looking into the problem some more, I have found a solution.

The printf output is stored in some buffer, and is only flushed in certain instances.
One of those ways is to call cudaDeviceSynchronize() after the hello() call.

Then I was successfully able to print Hello World.

Forum kindly sponsored by