You could do this fairly easily with any rendering software that does orthographic (which is most, maybe all these days).
This looks fairly simple – the modeling itself could be done in any software – this looks specifically like Revit to me, but could be done with more user friendly software like SketchUp. You could even make the more complex shapes like the chairs and couch in Sketchup using a subdivision modeling plugin.
The rendering itself could also be many things, but it looks to me like Enscape or Lumion. It has a little bit of that “video gamey” look to it, which means it’s likely one of the GPU-based real time rendering programs.
As for how the image is produced – what you’re looking at is exactly what the architect/designer saw unrendered on their screen. This stuff is all modeled and all there. The key to a good rendering is to have a good base model underneath.
If you were looking to produce this kind of image, I’d say model what you want to see using SketchUp, render real-time with Enscape/Lumion (or if you want to go the extra mile, use a CPU-based program like Thea or V-Ray after making sure the textures and whatnot are all how you’d like).
At this point you’re already 90% of the way there, all that’s left is to Photoshop in some lighting effects or textures if needed.