Discussion:
[PATCH] Support for MO direction 'group = smsbox-route' reloading (aka graceful restart)
Stipe Tolj
2018-10-01 12:56:33 UTC
Permalink
Hi all,

the reloading of the config (aka graceful restart) supports the
on-th-fly manipulation of the upstream SMSC groups and the corresponding
routing constraints for them.

What is doesn't support is the on-the-fly changing of the 'group =
smsbox-route' MO direction routing.

The attached patch adds this capability.

Please review. If no objections arise I will commit this within the next
days.

Cheers,
Stipe
--
Best Regards,
Stipe Tolj

-------------------------------------------------------------------
Düsseldorf, NRW, Germany

Kannel Foundation tolj.org system architecture
http://www.kannel.org/ http://www.tolj.org/

stolj at kannel.org st at tolj.org
-------------------------------------------------------------------
Stipe Tolj
2018-10-15 13:10:36 UTC
Permalink
Post by Stipe Tolj
Hi all,
the reloading of the config (aka graceful restart) supports the
on-th-fly manipulation of the upstream SMSC groups and the corresponding
routing constraints for them.
What is doesn't support is the on-the-fly changing of the 'group =
smsbox-route' MO direction routing.
The attached patch adds this capability.
Please review. If no objections arise I will commit this within the next
days.
committed to svn trunk, r5269.
--
Best Regards,
Stipe Tolj

-------------------------------------------------------------------
Düsseldorf, NRW, Germany

Kannel Foundation tolj.org system architecture
http://www.kannel.org/ http://www.tolj.org/

stolj at kannel.org st at tolj.org
-------------------------------------------------------------------
a***@kannel.org
2018-10-15 15:33:23 UTC
Permalink
Hi Stipe,

sorry I was not available earlier


just checked your patch and see man many issues


1.

+static void init_smsbox_routes(Cfg *cfg, int reload)

you use continue if config is wrong

but you already destroyed previous configuration

- /* send new config to clients */
+ gw_rwlock_wrlock(smsbox_list_rwlock);
+ dict_destroy(smsbox_by_smsc);
+ dict_destroy(smsbox_by_receiver);
+ dict_destroy(smsbox_by_smsc_receiver);
+ smsbox_by_smsc = dict_create(30, (void(*)(void *)) octstr_destroy);
+ smsbox_by_receiver = dict_create(50, (void(*)(void *)) octstr_destroy);
+ smsbox_by_smsc_receiver = dict_create(50, (void(*)(void *)) octstr_destroy);
+ init_smsbox_routes(cfg, 1);
+ gw_rwlock_unlock(smsbox_list_rwlock);

That means, you have inconsistency


2. if I see reload configs without a lock
 Uhh I can construct soo many cases, e.g. smsc is renamed in new config but still used in the old smsbox group, etc. etc.

int bb_graceful_restart(void)
{
- return smsc2_graceful_restart();
+ Cfg *cfg;
+
+ info(0, "Reloading configuration resource `%s'.", octstr_get_cstr(cfg_filename));
+ cfg = cfg_create(cfg_filename);
+ if (cfg_read(cfg) == -1) {
+ error(0, "Error processing configuration resource `%s'. Continue with existing configuration.",
+ octstr_get_cstr(cfg_filename));
+ return -1;
+ }
+
+ smsc2_graceful_restart(cfg);
+ smsbox_restart(cfg);
+ return 0;
}


Reload has to be atomic operation otherwise you will have some time inconsistent state.

Thanks,
Alex
Post by Stipe Tolj
Post by Stipe Tolj
Hi all,
the reloading of the config (aka graceful restart) supports the
on-th-fly manipulation of the upstream SMSC groups and the corresponding
routing constraints for them.
What is doesn't support is the on-the-fly changing of the 'group =
smsbox-route' MO direction routing.
The attached patch adds this capability.
Please review. If no objections arise I will commit this within the next
days.
committed to svn trunk, r5269.
--
Best Regards,
Stipe Tolj
-------------------------------------------------------------------
DÃŒsseldorf, NRW, Germany
Kannel Foundation tolj.org <http://tolj.org/> system architecture
http://www.kannel.org/ <http://www.kannel.org/> http://www.tolj.org/ <http://www.tolj.org/>
stolj at kannel.org <http://kannel.org/> st at tolj.org <http://tolj.org/>
-------------------------------------------------------------------
Stipe Tolj
2018-10-17 08:42:11 UTC
Permalink
Post by a***@kannel.org
Hi Stipe,
sorry I was not available earlier…
Hi Alex,
Post by a***@kannel.org
just checked your patch and see man many issues…
1.
+static void init_smsbox_routes(Cfg *cfg, int reload)
you use continue if config is wrong
but you already destroyed previous configuration
- /* send new config to clients */
+ gw_rwlock_wrlock(smsbox_list_rwlock);
+ dict_destroy(smsbox_by_smsc);
+ dict_destroy(smsbox_by_receiver);
+ dict_destroy(smsbox_by_smsc_receiver);
+ smsbox_by_smsc = dict_create(30, (void(*)(void *)) octstr_destroy);
+ smsbox_by_receiver = dict_create(50, (void(*)(void *)) octstr_destroy);
+ smsbox_by_smsc_receiver = dict_create(50, (void(*)(void *))
octstr_destroy);
+ init_smsbox_routes(cfg, 1);
+ gw_rwlock_unlock(smsbox_list_rwlock);
That means, you have inconsistency
hmm, yes we do this intentionally actually.

We destroy the structures, then re-parse the 'group = smsbox-route'
groups from the configuration. If a group has an error, we only dump the
error but continue to parse the following groups.

A restart would formally break via PANIC, since the config has an issue.
But we keep on going with all groups that are configured correctly.

I'm assuming you would rather re-parse the new config, and ONLY if it
has no error swap the new config with the old one. If it contains an
error rather keep running the old one?
Post by a***@kannel.org
2. if I see reload configs without a lock… Uhh I can construct soo many
cases, e.g. smsc is renamed in new config but still used in the old
smsbox group, etc. etc.
int bb_graceful_restart(void)
{
- return smsc2_graceful_restart();
+ Cfg *cfg;
+
+ info(0, "Reloading configuration resource `%s'.",
octstr_get_cstr(cfg_filename));
+ cfg = cfg_create(cfg_filename);
+ if (cfg_read(cfg) == -1) {
+ error(0, "Error processing configuration resource `%s'. Continue with
existing configuration.",
+ octstr_get_cstr(cfg_filename));
+ return -1;
+ }
+
+ smsc2_graceful_restart(cfg);
+ smsbox_restart(cfg);
+ return 0;
}
the smsbox_restart() locks down via

gw_rwlock_wrlock(smsbox_list_rwlock);

and the MO receiving part do route_incoming_to_boxc(), which sets:

gw_rwlock_rdlock(smsbox_list_rwlock);

so we SHOULD never be in a race-condition where we would try to get a MO
routing and the structures are destroyed in the middle of the process.
Right?

I don't understand the argument with "smsc is renamed in new config but
still used in the old smsbox group". Could you please explain?

Cheers,
Stipe
--
Best Regards,
Stipe Tolj

-------------------------------------------------------------------
Düsseldorf, NRW, Germany

Kannel Foundation tolj.org system architecture
http://www.kannel.org/ http://www.tolj.org/

stolj at kannel.org st at tolj.org
-------------------------------------------------------------------
Stipe Tolj
2018-11-03 16:18:10 UTC
Permalink
Post by Stipe Tolj
hmm, yes we do this intentionally actually.
We destroy the structures, then re-parse the 'group = smsbox-route'
groups from the configuration. If a group has an error, we only dump the
error but continue to parse the following groups.
A restart would formally break via PANIC, since the config has an issue.
But we keep on going with all groups that are configured correctly.
I'm assuming you would rather re-parse the new config, and ONLY if it
has no error swap the new config with the old one. If it contains an
error rather keep running the old one?
Post by a***@kannel.org
2. if I see reload configs without a lock… Uhh I can construct soo many
cases, e.g. smsc is renamed in new config but still used in the old
smsbox group, etc. etc.
int bb_graceful_restart(void)
{
- return smsc2_graceful_restart();
+ Cfg *cfg;
+
+ info(0, "Reloading configuration resource `%s'.",
octstr_get_cstr(cfg_filename));
+ cfg = cfg_create(cfg_filename);
+ if (cfg_read(cfg) == -1) {
+ error(0, "Error processing configuration resource `%s'. Continue with
existing configuration.",
+ octstr_get_cstr(cfg_filename));
+ return -1;
+ }
+
+ smsc2_graceful_restart(cfg);
+ smsbox_restart(cfg);
+ return 0;
}
the smsbox_restart() locks down via
gw_rwlock_wrlock(smsbox_list_rwlock);
gw_rwlock_rdlock(smsbox_list_rwlock);
so we SHOULD never be in a race-condition where we would try to get a MO
routing and the structures are destroyed in the middle of the process.
Right?
I don't understand the argument with "smsc is renamed in new config but
still used in the old smsbox group". Could you please explain?
Hi Alex,

any update here from your side?

Thanks,
Stipe
--
Best Regards,
Stipe Tolj

-------------------------------------------------------------------
Düsseldorf, NRW, Germany

Kannel Foundation tolj.org system architecture
http://www.kannel.org/ http://www.tolj.org/

***@kannel.org ***@tolj.org
-------------------------------------------------------------------
a***@kannel.org
2018-11-04 10:34:44 UTC
Permalink
Hi Stipe,
Post by Stipe Tolj
Post by a***@kannel.org
Hi Stipe,
sorry I was not available earlier

Hi Alex,
Post by a***@kannel.org
just checked your patch and see man many issues

1.
+static void init_smsbox_routes(Cfg *cfg, int reload)
you use continue if config is wrong
but you already destroyed previous configuration
- /* send new config to clients */
+ gw_rwlock_wrlock(smsbox_list_rwlock);
+ dict_destroy(smsbox_by_smsc);
+ dict_destroy(smsbox_by_receiver);
+ dict_destroy(smsbox_by_smsc_receiver);
+ smsbox_by_smsc = dict_create(30, (void(*)(void *)) octstr_destroy);
+ smsbox_by_receiver = dict_create(50, (void(*)(void *)) octstr_destroy);
+ smsbox_by_smsc_receiver = dict_create(50, (void(*)(void *))
octstr_destroy);
+ init_smsbox_routes(cfg, 1);
+ gw_rwlock_unlock(smsbox_list_rwlock);
That means, you have inconsistency
hmm, yes we do this intentionally actually.
We destroy the structures, then re-parse the 'group = smsbox-route' groups from the configuration. If a group has an error, we only dump the error but continue to parse the following groups.
A restart would formally break via PANIC, since the config has an issue. But we keep on going with all groups that are configured correctly.
I'm assuming you would rather re-parse the new config, and ONLY if it has no error swap the new config with the old one. If it contains an error rather keep running the old one?
Yes, keep running with old configuration and let caller know that reload failed.
Post by Stipe Tolj
Post by a***@kannel.org
2. if I see reload configs without a lock
 Uhh I can construct soo many
cases, e.g. smsc is renamed in new config but still used in the old
smsbox group, etc. etc.
int bb_graceful_restart(void)
{
- return smsc2_graceful_restart();
+ Cfg *cfg;
+
+ info(0, "Reloading configuration resource `%s'.",
octstr_get_cstr(cfg_filename));
+ cfg = cfg_create(cfg_filename);
+ if (cfg_read(cfg) == -1) {
+ error(0, "Error processing configuration resource `%s'. Continue with
existing configuration.",
+ octstr_get_cstr(cfg_filename));
+ return -1;
+ }
+
+ smsc2_graceful_restart(cfg);
+ smsbox_restart(cfg);
+ return 0;
}
the smsbox_restart() locks down via
gw_rwlock_wrlock(smsbox_list_rwlock);
gw_rwlock_rdlock(smsbox_list_rwlock);
so we SHOULD never be in a race-condition where we would try to get a MO routing and the structures are destroyed in the middle of the process. Right?
It’s not about structure destroyed, it’s about inconsistent configuration that you have without full lock. Let’s build some scenario:

1. smsc2_graceful_restart(cfg);
2. (1) has renamed smsc-id or changed allowed/denied smsc-id
3. MO arrive but routed wrong because we still use old MO routing configuration
4. MO routing configuration gets reloaded
4.a if reload failed , whole configuration is broken/inconsistent
4.b if reload was ok, we are fine again

As you see we have issues (3) and 4.a . Therefore it’s not right to see to reload config partially. Reload has to be atomic operation.
Post by Stipe Tolj
I don't understand the argument with "smsc is renamed in new config but still used in the old smsbox group". Could you please explain?
Cheers,
Stipe
--
Best Regards,
Stipe Tolj
-------------------------------------------------------------------
DÃŒsseldorf, NRW, Germany
Kannel Foundation tolj.org <http://tolj.org/> system architecture
http://www.kannel.org/ <http://www.kannel.org/> http://www.tolj.org/ <http://www.tolj.org/>
stolj at kannel.org <http://kannel.org/> st at tolj.org <http://tolj.org/>
-------------------------------------------------------------------
Loading...