Discussion:
Success sending Unicode characters over SMS
Jörg Pommnitz
2001-04-30 15:59:42 UTC
Permalink
Hi List,
as announced I have worked to get Unicode characters over SMS.
Sorry, the result is not fit for publication (it's a risk for
public health; you might lose your lunch).
Anyway, here is a short description on how I did it:

1. The HTTP interface: add a new CGI parameter "encoding"
2. The smsbox/bearerbox message: add a new field encoding. The
field is an integer value that takes the MIBenum value assigned
to different encodings by the IANA.
3. A new function "octstr_recode" that recodes a octstr from one
encoding scheme to another (based on iconv()).

Putting these three things together I modified pdu_encode to support
the Data Coding Scheme UCS2.
The following command:
lynx -dump
"http://localhost:8090/cgi-bin/sendsms?user=foo&password=bar&from=1234&to=12
345&text=%c3%84&encoding=utf-8"

translates to the following PDU:
0011000C919471819957290008A70200C4

A Nokia 7110 from Taiwan (Language setting "Chinese") properly displays
the message.

Open problems:
1. protect public health by cleaning up the code
2. Message size calculation.

Regards
Jörg
Richard Braakman
2001-05-02 07:03:01 UTC
Permalink
Post by Jörg Pommnitz
A Nokia 7110 from Taiwan (Language setting "Chinese") properly displays
the message.
Congratulations :)
Post by Jörg Pommnitz
2. Message size calculation.
I've been thinking about this. The current setup doesn't do it right
either -- it counts latin-1 characters and assumes 160 of them will fit
in the final message, but some of those characters (such as { and }) are
going to take up two slots.

I think the best way to handle this is to convert to the final character
set before doing the splitting. This could be either in the smsc driver
or in the smsbox. In the former case, you'll have to transfder a lot of
information (header, footer, concatenation...) about how to do the
splitting to the bearerbox. In the latter case, you'll have to encode a
lot of information about GSM in the smsbox.

The basic problem here is that splitting is configured per service, but
knowledge about the bearer belongs with the SMSC configuration -- and
the bearer might not always be GSM.

I think the easiest approach is to do both character set conversion and
splitting in the smsbox, because otherwise each smsc driver has to deal
with it separately. It still means adapting each smsc driver to receive
GSM (or UCS-2) text instead of latin-1 text, but that will actually
simplify some of them.

Richard Braakman

Loading...