If an end-node is offline or otherwise unresponsive, and you send an “ON” command using a ‘single-topic’ architecture, then this would force things out of sync. All the rest of the system will wrongly assume that the device is ON.
Yes, for a little while — but only for low-importance devices. I’m unsure why that’s a bad thing, but pretty sure if it is, there’s some nasty complexity in your layout that’s gonna come back to bite you.
Take a light, for example. Right now, my bedroom lamp, being a Sonoff device, I have had it take almost a minute to come on or go off but it none the less has, every single time in the couple months it’s been connected up; comes on in the morning, goes off when I leave the room, comes back on when I re-enter, and goes off when I sleep — the only weak point being the Bluetooth on my phone that I use for presence detection, causes it to switch off when I’m still present (also used to occasionally lock my computer in the middle of whatever I was doing). If switch it in the interface, it always takes a second or two at least, but it does come on. Whenever I’ve checked the Sonoff app while I’m out, it’s always shown as off. And that’s with it being hideously convoluted, too; the interface and controller scripts (literally, they’re written in bash), set an MQTT topic, another controller script is monitoring that topic, and issues a web request to IFTTT, which passes it on to Sonoff (or rather, eWeLink, or whatever it’s called), and there back again to the device via “the cloud” and wifi, that then switches on the light (haven’t gotten around to trying out Tasmota, waiting until I get a second Sonoff to screw up rather than the one that’s in actual use). If that process breaks at any point after the MQTT server, I’m sol anyhow, but even still so what, it’s a light, no huge drama, that’s what the flashlight button on my phone is for. And the local loop is still vastly quicker than the Sonoff leg of the journey, plus for the most part if I press the light switch, it indicates the light has changed, but it hasn’t, I’m going to know anyhow, and if I don’t, I’m probably not going to care for at least a little while either — plenty of time for it to catch up. So yeah, I’ve probably got a bit of a, “she’ll be right” attitude going on there, at least partly enforced by the present situation.
Also, don’t forget MQTT Will messages, I use those with every MQTT-connected device, along with a “hello” message whenever it (re)connects, and in general, it seems to have a lot of self-healing. But I do agree, the failure response case is perhaps a little too long for immediate-response edge nodes like a simple light, because you have to wait for the MQTT server to realise there’s a problem — of course, assuming the network or node fails right then, if it was already down the interface surface will already be displaying a failure state.
The question is, what exactly are you protecting against with that status response?
The only benefit I can see, is you get a bit of an impromptu network latency test thrown in for free. Unless something really weird happens within the edge node that causes it to continue to respond on the network, but still not work… and in that case, I’d say the chances are pretty good it’s also sent back the response too, and yet still hasn’t done anything? Anything after the firmware (including the output to the relay, and the relay itself) won’t be detected either way, and anything before it, still will — there’s just a little delay in my case before it shows as a problem, and that will, eventually, be picked up and sent to my phone as an alert, for example.
And in the case of a light with more than one control interface, the immediate notification I consider to be better, and less likely to cause race conditions — which is another source of desync. If two devices emit opposing commands at the same time, things could potentially get a little ugly.
My ‘garage controller’ (Arduino based system that I put together) uses a single topic and it acts funny - this shows up because I send an “Open” command via MQTT and instantly the icon gets messed up. It starts correctly as a “closed” icon, but when I tap the icon to open it, the icon switches immediately & directly to “fully opened” because OpenHab sees this instantaneously. Then, the request reaches my controller and the physical door starts to open. A moment later it clears the “closed” prox and updates MQTT --> the Icon changes to “50% open”. A few seconds later when it hits the “Open” prox, the icon changes a second time to the “Fully open” one.
That sounds like the messages are being interpreted wrong; the “Open command” in my layout, gets interpreted by the interface as an “in progress” status, and should be displayed as thus — so if it showed as “fully open” immediately, then something’s wrong, or is that a limitation in the control interface you’re using (considering I’m still using a fairly crufty web page and some Javascript)? I had a mockup using a randomish delay to represent a door opening (garage doors and lights being like the two most common cases that I can think of), for a while, and it showed just fine and reliable with my single-topic scheme as I’ve presented it: when it received the “Open command” reflected back from MQTT, it showed as “opening”, and when the thing was done it sent back “Opened status”, which shows as “open” on the page — that was as part of a couple mockups I tossed together for various devices to work through their communications needs. And I was tormenting it by yanking out the network or power connections at odd times, too.
You have made me think, though, the failure message is presently coming from a timer on the edge node, which is perhaps not the best place for it. But, I was also assuming it’d emit a message when the Closed limit switch broke, so you’d get an Opening message anyhow in addition to the Open command, some very short time after the door started to open — and that’s much better than the edge node acknowledging the command, anyhow. Again, if the network and control system is intact, is there any reason the edge node wouldn’t receive and act on the command? If the edge node itself is down, what’s the chances of a functional MQTT connection to hold back the Will message (and subsequent alert)? And if it’s just messed up, then all bets are off anyhow unless you’ve got an entirely separate device doing the monitoring (for example, the output might have become an input, and the internal pull-up resistor isn’t enough to switch the controlling relay).
To clear up any confusion, that all involves a philosophy I had in my thinking; for a light, all I really need to know is whether the command has been received by MQTT, and it’ll change when it gets the chance. I can’t think of any state in which it would just silently not turn off (better example than not turning on, in which the globe could be dead — and split or single topics, you’re still not going to have a clue). If it’s something important, like a garage door, than I’ll close the loop either within the device (the limit switches emitting status messages), or through a second device entirely (as in the case of my desk clock with a light level sensor — I’m working on using it to indicate when the Sonoff gets around to flicking the relay). So in general, my philosophy was one of optimism, with fallback. The split topic seems to be more pessimistic, and potentially a false sense of security…
Being the OCD belt-and-suspenders type that I am, that delay between failure, and the failure indication, is now kinda bugging me, so I’ll probably think about going split topic like everyone else just to close that hole… eventually. Where before the single topic method also saved me just enough very precious bytes in my dinky little EtherTen’s firmware, that’s not really an issue now it’s been replanted into an EtherMega (of which only two of it’s huge number of ports are actually used), and with my contemplated switch to RS485 anyhow, there won’t be that great big ethernet library (which I can’t use in the project because it’s now even bigger). Power over RS485… good ol’ Po…S? Getting MQTT running over 485 is going to be kinda fun, too… But Ethernet is just such a gruesome heavy-weight, and quite unnecessary…
Which leaves from my original post — since part of my original question(s) have been shipped off to your new post — the MQTT topic tree. Do people program the edge devices with a common name, or do we give them an ID, and do the translation elsewhere? For example, Jon’s old Arduino light switches, were programmed with a common name, where his newer ones use the ID principle, as I understand them. My LED strip is named “5410EC4C7221”, which is just the built in MAC address. That goes into a mapping file, that allows me to refer to it transparently as either that number, or the common name “ledstrip”, and I can change the common name at will, without having to change the firmware (or vice versa, if I need to update the firmware, every individual device doesn’t need it’s own private instance). And, does anyone use a discovery scheme? My nodes emit a “hello” message (and re-emit it if they subsequently receive one, too), including the devices firmware name and version, IP address, and extra info which might be handy (like, the type and number of LEDs in the ledstrip).
I have been considering adding the ability to set the “board name”, which I believe Arduino expects to be at the end of EEPROM, and use that in place of the device ID… But I’m unsure whether that complexity is even at all useful, and it would require some extra work to handle it — not only in terms of setting/updating it, but just using it in the MQTT topic; need to manage the subscriptions, I believe it’s written backwards in EEPROM (?), which makes it harder to use, etc.