summaryrefslogblamecommitdiffstats
path: root/sds/README.md
blob: a2fd4db649acb49be0b0bb5d23cecface3bb68f3 (plain) (tree)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875










































































































































































































































































































































































































































































































































































































































































































































































































































































































                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
Simple Dynamic Strings
===

SDS is a string library for C designed to augment the limited libc string
handling functionalities by adding heap allocated strings that are:

* Simpler to use.
* Binary safe.
* Computationally more efficient.
* But yet... Compatible with normal C string functions.

This is achieved using an alternative design in which instead of using a C
structure to represent a string, we use a binary prefix that is stored
before the actual pointer to the string that is returned by SDS to the user.

    +--------+-------------------------------+-----------+
    | Header | Binary safe C alike string... | Null term |
    +--------+-------------------------------+-----------+
             |
             `-> Pointer returned to the user.

Because of meta data stored before the actual returned pointer as a prefix,
and because of every SDS string implicitly adding a null term at the end of
the string regardless of the actual content of the string, SDS strings work
well together with C strings and the user is free to use them interchangeably
with real-only functions that access the string in read-only.

SDS was a C string I developed in the past for my everyday C programming needs,
later it was moved into Redis where it is used extensively and where it was
modified in order to be suitable for high performance operations. Now it was
extracted from Redis and forked as a stand alone project.

Because of its many years life inside Redis, SDS provides both higher level
functions for easy strings manipulation in C, but also a set of low level
functions that make it possible to write high performance code without paying
a penalty for using an higher level string library.

Advantages and disadvantages of SDS
===

Normally dynamic string libraries for C are implemented using a structure
that defines the string. The structure has a pointer field that is managed
by the string function, so it looks like this:

```c
struct yourAverageStringLibrary {
    char *buf;
    size_t len;
    ... possibly more fields here ...
};
```

SDS strings are already mentioned don't follow this schema, and are instead
a single allocation with a prefix that lives *before* the address actually
returned for the string.

There are advantages and disadvantages with this approach over the traditional
approach:

**Disadvantage #1**: many functions return the new string as value, since sometimes SDS requires to create a new string with more space, so the most SDS API calls look like this:

```c
s = sdscat(s,"Some more data");
```

As you can see `s` is used as input for `sdscat` but is also set to the value
returned by the SDS API call, since we are not sure if the call modified the
SDS string we passed or allocated a new one. Not remembering to assign back
the return value of `sdscat` or similar functions to the variable holding
the SDS string will result in a bug.

**Disadvantage #2**: if an SDS string is shared in different places in your program you have to modify all the references when you modify the string. However most of the times when you need to share SDS strings it is much better to encapsulate them into structures with a `reference count` otherwise it is too easy to incur into memory leaks.

**Advantage #1**: you can pass SDS strings to functions designed for C functions without accessing a struct member or calling a function, like this:

```c
printf("%s\n", sds_string);
```

In most other libraries this will be something like:

```c
printf("%s\n", string->buf);
```

Or:

```c
printf("%s\n", getStringPointer(string));
```

**Advantage #2**: accessing individual chars is straightforward. C is a low level language so this is an important operation in many programs. With SDS strings accessing individual chars is very natural:

```c
printf("%c %c\n", s[0], s[1]);
```

With other libraries your best chance is to assign `string->buf` (or call the function to get the string pointer) to a `char` pointer and work with this. However since the other libraries may reallocate the buffer implicitly every time you call a function that may modify the string you have to get a reference to the buffer again.

**Advantage #3**: single allocation has better cache locality. Usually when you access a string created by a string library using a structure, you have two different allocations for the structure representing the string, and the actual buffer holding the string. Over the time the buffer is reallocated, and it is likely that it ends in a totally different part of memory compared to the structure itself. Since modern programs performances are often dominated by cache misses, SDS may perform better in many workloads.

SDS basics
===

The type of SDS strings is just the char pointer `char *`. However SDS defines
an `sds` type as alias of `char *` in its header file: you should use the
`sds` type in order to make sure you remember that a given variable in your
program holds an SDS string and not a C string, however this is not mandatory.

This is the simplest SDS program you can write that does something:

```c
sds mystring = sdsnew("Hello World!");
printf("%s\n", mystring);
sdsfree(mystring);

output> Hello World!
```

The above small program already shows a few important things about SDS:

* SDS strings are created, and heap allocated, via the `sdsnew()` function, or other similar functions that we'll see in a moment.
* SDS strings can be passed to `printf()` like any other C string.
* SDS strings require to be freed with `sdsfree()`, since they are heap allocated.

Creating SDS strings
---

```c
sds sdsnewlen(const void *init, size_t initlen);
sds sdsnew(const char *init);
sds sdsempty(void);
sds sdsdup(const sds s);
```

There are many ways to create SDS strings:

* The `sdsnew` function creates an SDS string starting from a C null terminated string. We already saw how it works in the above example.
* The `sdsnewlen` function is similar to `sdsnew` but instead of creating the string assuming that the input string is null terminated, it gets an additional length parameter. This way you can create a string using binary data:

    ```c
    char buf[3];
    sds mystring;

    buf[0] = 'A';
    buf[1] = 'B';
    buf[2] = 'C';
    mystring = sdsnewlen(buf,3);
    printf("%s of len %d\n", mystring, (int) sdslen(mystring));

    output> ABC of len 3
    ```

  Note: `sdslen` return value is casted to `int` because it returns a `size_t`
type. You can use the right `printf` specifier instead of casting.

* The `sdsempty()` function creates an empty zero-length string:

    ```c
    sds mystring = sdsempty();
    printf("%d\n", (int) sdslen(mystring));

    output> 0
    ```

* The `sdsdup()` function duplicates an already existing SDS string:

    ```c
    sds s1, s2;

    s1 = sdsnew("Hello");
    s2 = sdsdup(s1);
    printf("%s %s\n", s1, s2);

    output> Hello Hello
    ```

Obtaining the string length
---

```c
size_t sdslen(const sds s);
```

In the examples above we already used the `sdslen` function in order to get
the length of the string. This function works like `strlen` of the libc
except that:

* It runs in constant time since the length is stored in the prefix of SDS strings, so calling `sdslen` is not expensive even when called with very large strings.
* The function is binary safe like any other SDS string function, so the length is the true length of the string regardless of the content, there is no problem if the string includes null term characters in the middle.

As an example of the binary safeness of SDS strings, we can run the following
code:

```c
sds s = sdsnewlen("A\0\0B",4);
printf("%d\n", (int) sdslen(s));

output> 4
```

Note that SDS strings are always null terminated at the end, so even in that
case `s[4]` will be a null term, however printing the string with `printf`
would result in just `"A"` to be printed since libc will treat the SDS string
like a normal C string.

Destroying strings
---

```c
void sdsfree(sds s);
```

The destroy an SDS string there is just to call `sdsfree` with the string
pointer. However note that empty strings created with `sdsempty` need to be
destroyed as well otherwise they'll result into a memory leak.

The function `sdsfree` does not perform any operation if instead of an SDS
string pointer, `NULL` is passed, so you don't need to check for `NULL` explicitly before calling it:

```c
if (string) sdsfree(string); /* Not needed. */
sdsfree(string); /* Same effect but simpler. */
```

Concatenating strings
---

Concatenating strings to other strings is likely the operation you will end
using the most with a dynamic C string library. SDS provides different
functions to concatenate strings to existing strings.

```c
sds sdscatlen(sds s, const void *t, size_t len);
sds sdscat(sds s, const char *t);
```

The main string concatenation functions are `sdscatlen` and `sdscat` that are
identical, the only difference being that `sdscat` does not have an explicit
length argument since it expects a null terminated string.

```c
sds s = sdsempty();
s = sdscat(s, "Hello ");
s = sdscat(s, "World!");
printf("%s\n", s);

output> Hello World!
```

Sometimes you want to cat an SDS string to another SDS string, so you don't
need to specify the length, but at the same time the string does not need to
be null terminated but can contain any binary data. For this there is a
special function:

```c
sds sdscatsds(sds s, const sds t);
```

Usage is straightforward:

```c
sds s1 = sdsnew("aaa");
sds s2 = sdsnew("bbb");
s1 = sdscatsds(s1,s2);
sdsfree(s2);
printf("%s\n", s1);

output> aaabbb
```

Sometimes you don't want to append any special data to the string, but you want
to make sure that there are at least a given number of bytes composing the
whole string.

```c
sds sdsgrowzero(sds s, size_t len);
```

The `sdsgrowzero` function will do nothing if the current string length is
already `len` bytes, otherwise it will enlarge the string to `len` just padding
it with zero bytes.

```c
sds s = sdsnew("Hello");
s = sdsgrowzero(s,6);
s[5] = '!'; /* We are sure this is safe because of sdsgrowzero() */
printf("%s\n', s);

output> Hello!
```

Formatting strings
---

There is a special string concatenation function that accepts a `printf` alike
format specifier and cats the formatted string to the specified string.

```c
sds sdscatprintf(sds s, const char *fmt, ...) {
```

Example:

```c
sds s;
int a = 10, b = 20;
s = sdsnew("The sum is: ");
s = sdscatprintf(s,"%d+%d = %d",a,b,a+b);
```

Often you need to create SDS string directly from `printf` format specifiers.
Because `sdscatprintf` is actually a function that concatenates strings all
you need is to concatenate your string to an empty string:


```c
char *name = "Anna";
int loc = 2500;
sds s;
s = sdscatprintf(sdsempty(), "%s wrote %d lines of LISP\n", name, loc);
```

You can use `sdscatprintf` in order to convert numbers into SDS strings:

```c
int some_integer = 100;
sds num = sdscatprintf(sdsempty(),"%d\n", some_integer);
```

However this is slow and we have a special function to make it efficient.

Fast number to string operations
---

Creating an SDS string from an integer may be a common operation in certain
kind of programs, and while you may do this with `sdscatprintf` the performance
hit is big, so SDS provides a specialized function.

```c
sds sdsfromlonglong(long long value);
```

Use it like this:

```c
sds s = sdsfromlonglong(10000);
printf("%d\n", (int) sdslen(s));

output> 5
```

Trimming strings and getting ranges
---

String trimming is a common operation where a set of characters are
removed from the left and the right of the string. Another useful operation
regarding strings is the ability to just take a range out of a larger
string.

```c
void sdstrim(sds s, const char *cset);
void sdsrange(sds s, int start, int end);
```

SDS provides both the operations with the `sdstrim` and `sdsrange` functions.
However note that both functions work differently than most functions modifying
SDS strings since the return value is null: basically those functions always
destructively modify the passed SDS string, never allocating a new one, because
both trimming and ranges will never need more room: the operations can only
remove characters from the original strings.

Because of this behavior, both functions are fast and don't involve reallocation.

This is an example of string trimming where newlines and spaces are removed
from an SDS strings:

```c
sds s = sdsnew("         my string\n\n  ");
sdstrim(s," \n");
printf("-%s-\n",s);

output> -my string-
```

Basically `sdstrim` takes the SDS string to trim as first argument, and a
null terminated set of characters to remove from left and right of the string.
The characters are removed as long as they are not interrupted by a character
that is not in the list of characters to trim: this is why the space between
`"my"` and `"string"` was preserved in the above example.

Taking ranges is similar, but instead to take a set of characters, it takes
to indexes, representing the start and the end as specified by zero-based
indexes inside the string, to obtain the range that will be retained.

```c
sds s = sdsnew("Hello World!");
sdsrange(s,1,4);
printf("-%s-\n");

output> -ello-
```

Indexes can be negative to specify a position starting from the end of the
string, so that `-1` means the last character, `-2` the penultimate, and so forth:

```c
sds s = sdsnew("Hello World!");
sdsrange(s,6,-1);
printf("-%s-\n");
sdsrange(s,0,-2);
printf("-%s-\n");

output> -World!-
output> -World-
```

`sdsrange` is very useful when implementing networking servers processing
a protocol or sending messages. For example the following code is used
implementing the write handler of the Redis Cluster message bus between
nodes:

```c
void clusterWriteHandler(..., int fd, void *privdata, ...) {
    clusterLink *link = (clusterLink*) privdata;
    ssize_t nwritten = write(fd, link->sndbuf, sdslen(link->sndbuf));
    if (nwritten <= 0) {
        /* Error handling... */
    }
    sdsrange(link->sndbuf,nwritten,-1);
    ... more code here ...
}
```

Every time the socket of the node we want to send the message to is writable
we attempt to write as much bytes as possible, and we use `sdsrange` in order
to remove from the buffer what was already sent.

The function to queue new messages to send to some node in the cluster will
simply use `sdscatlen` in order to put more data in the send buffer.

Note that the Redis Cluster bus implements a binary protocol, but since SDS
is binary safe this is not a problem, so the goal of SDS is not just to provide
an high level string API for the C programmer but also dynamically allocated
buffers that are easy to manage.

String copying
---

The most dangerous and infamus function of the standard C library is probably
`strcpy`, so perhaps it is funny how in the context of better designed dynamic
string libraries the concept of copying strings is almost irrelevant. Usually
what you do is to create strings with the content you want, or concatenating
more content as needed.

However SDS features a string copy function that is useful in performance