Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
6e561df
adding lambas to select statements for customer lists and reports
SethHamilton Nov 13, 2019
6efbd96
roughing in customer list functionality
SethHamilton Nov 14, 2019
6127096
first version of customer list query returning JSON
SethHamilton Nov 15, 2019
1b5c9f3
replaced customer props store and encode, fixed profiling perf problems
SethHamilton Nov 20, 2019
b3a77c2
fixed code, tests now pass
SethHamilton Nov 20, 2019
414db6f
customer indexes are working at a basic level
SethHamilton Nov 21, 2019
cd6b5ba
IndexBit caching to reduce index thrashing on insert
SethHamilton Nov 23, 2019
cc97316
new memory management for index bits, LRU for bits
SethHamilton Nov 25, 2019
cf8214e
new memory management for index bits, LRU for bits
SethHamilton Nov 25, 2019
f32b56c
added includes for linux/gcc
SethHamilton Nov 25, 2019
e6e7d68
std::pair hash fix, asc/desc sorts by values and basic sort
SethHamilton Nov 26, 2019
f94349b
changed to const reference
SethHamilton Nov 26, 2019
979fa68
gcc compatiblity fix
SethHamilton Nov 26, 2019
dfa8a7b
gcc compatiblity fix
SethHamilton Nov 26, 2019
63e5048
gcc compatiblity fix
SethHamilton Nov 26, 2019
be98d19
documentation updates
SethHamilton Nov 27, 2019
beceea4
small block alloctor fix, always_fresh flag, index def fix
SethHamilton Nov 29, 2019
368dafe
fix for always_fresh segment flag
SethHamilton Nov 29, 2019
dbb91ad
fixed blhash.h iterator bug, double entries
SethHamilton Dec 2, 2019
86c0035
fixed cache eviction
SethHamilton Dec 3, 2019
82c55cd
version bump
SethHamilton Dec 3, 2019
a5bd834
fixed leak in index load/writeback
SethHamilton Dec 3, 2019
0d9806c
fixed leak in index load/writeback
SethHamilton Dec 3, 2019
03bb666
remove unused allocation total
SethHamilton Dec 3, 2019
106b728
potentially fixed index writeback bug
SethHamilton Dec 4, 2019
2e28622
properties query fix, skip empty pages in index
SethHamilton Dec 5, 2019
c89f714
set up for 16 workers
SethHamilton Dec 6, 2019
3f35056
flow control, averaging fix, bucket indexes
SethHamilton Dec 11, 2019
543660c
version bump
SethHamilton Dec 12, 2019
1d59378
fixed partition hashing bug, faster inserts, list limits
SethHamilton Dec 13, 2019
3c4921a
increase thread limits for 32 core setup
SethHamilton Dec 13, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -83,10 +83,9 @@ set(SOURCE_FILES
lib/include/libcommon.h
lib/mem/bloom.cpp
lib/mem/bloom.h
lib/mem/prequeues.cpp
lib/mem/prequeues.h
lib/mem/ssdict.h
lib/mem/blhash.h
lib/mem/segmented_list.h
lib/str/strtools.cpp
lib/str/strtools.h
lib/threads/spinlock.h
Expand All @@ -108,6 +107,10 @@ set(SOURCE_FILES
src/attributes.h
src/config.cpp
src/config.h
src/customer_index.cpp
src/customer_index.h
src/customer_props.cpp
src/customer_props.h
src/database.cpp
src/database.h
src/dbtypes.h
Expand Down Expand Up @@ -136,6 +139,10 @@ set(SOURCE_FILES
src/oloop_cleaner.h
src/oloop_customer.cpp
src/oloop_customer.h
src/oloop_customer_basic.cpp
src/oloop_customer_basic.h
src/oloop_customer_list.cpp
src/oloop_customer_list.h
src/oloop_histogram.cpp
src/oloop_histogram.h
src/oloop_insert.cpp
Expand Down
22 changes: 15 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@

| Platform | Version | Info | Status |
| :---------- | :-----: | :------------------------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Linux x64 | 0.4.4 | gcc 7.2, release, debug | [![Build Status](https://travis-ci.org/opset/openset.svg?branch=master)](https://travis-ci.org/opset/openset) |
| Windows x64 | 0.4.4 | Visual C++ 2017, release, debug | [![Build status](https://ci.appveyor.com/api/projects/status/pr8jrhfth2bt7j6r/branch/master?svg=true)](https://ci.appveyor.com/project/SethHamilton/openset/branch/master) |
| Linux x64 | 0.4.5 | gcc 7.2, release, debug | [![Build Status](https://travis-ci.org/opset/openset.svg?branch=master)](https://travis-ci.org/opset/openset) |
| Windows x64 | 0.4.5 | Visual C++ 2017, release, debug | [![Build status](https://ci.appveyor.com/api/projects/status/pr8jrhfth2bt7j6r/branch/master?svg=true)](https://ci.appveyor.com/project/SethHamilton/openset/branch/master) |

:coffee: OpenSet is currently in alpha. Please see v0.4.4 release notes below.
:coffee: OpenSet is currently in alpha. Please see v0.4.5 release notes below.

# What's it do?

Expand Down Expand Up @@ -62,7 +62,7 @@ git clone https://github.com/opset/openset_samples.git
**2. Install [Docker](https://www.docker.com/) and start OpenSet (in interactive mode).**

```bash
docker run -p 8080:8080 -e OS_HOST=127.0.0.1 -e OS_PORT=8080 --rm=true -it opset/openset_x64_rel:0.4.4
docker run -p 8080:8080 -e OS_HOST=127.0.0.1 -e OS_PORT=8080 --rm=true -it opset/openset_x64_rel:0.4.5
```

> **Note** The OpenSet images can always be found on [dockerhub](https://cloud.docker.com/u/opset/repository/docker/opset/openset_x64_rel).
Expand Down Expand Up @@ -146,7 +146,7 @@ response:

> :bulb: view the event data [here](https://github.com/opset/openset_samples/blob/master/data/highstreet_events.json)

**7. Let's perform an `event` query.**
**7. Let's generate a report.**

This query searches through each customer looking for matching events in a customers history.

Expand All @@ -156,7 +156,7 @@ A cool feature of OpenSet grouping is that all branches of the result set will b

```ruby
curl \
-X POST http://127.0.0.1:8080/v1/query/highstreet/event \
-X POST http://127.0.0.1:8080/v1/query/highstreet/report \
--data-binary @- << EOF | json_pp

# define which properties we want to aggregate
Expand Down Expand Up @@ -527,7 +527,7 @@ The query then searches for the next subsequent `purchase` event and records the

```ruby
curl \
-X POST http://127.0.0.1:8080/v1/query/highstreet/event \
-X POST http://127.0.0.1:8080/v1/query/highstreet/report \
--data-binary @- << EOF | json_pp
# our osl script

Expand Down Expand Up @@ -680,6 +680,14 @@ Ultimately DeepMetrix had to say no to Bud, but that failure planted a seed.

# Release Notes

### 0.4.5

- the `event` query endpoint has been renamed `report`. The new name expresses the purpose of the endpoint better, as events play a role in all queries.
- `id_type` is
- added `customers` query. The customer query returns a list of customer id's and selected `customer properties` or computed values for each customer. The list can be paginated, and sorted on alternate indexes (defined when a table is created).
- faster smaller indexes. The old index caused lots of memory reallocation as indexes grew. An LRU was also added to the indexing system to keep hot indexes in an uncompressed state.
- added lamda functions in select statements. A lambda allows a select parameter to get it's value from a code. This could makes it possible to select the value of a variable or inline aggregation.

### 0.4.4

- added `id_type` to switch in create table. This is now required and allows you to specify `numeric` or `textual` customer ids.
Expand Down
15 changes: 7 additions & 8 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
# Documentation

&nbsp;

**topics**
**Help**

* [Quick Overview](https://github.com/opset/openset/tree/master/docs/osl/README.md)
* [Scripting Language (OSL)](https://github.com/opset/openset/blob/master/docs/osl/language_reference.md)
* [API](https://github.com/opset/openset/tree/master/docs/rest/README.md)

**Nerdier Matters**
* [Docker Images](https://github.com/opset/openset/tree/master/docs/docker) (recommended - run anywhere)
* [Building and Installing](https://github.com/opset/openset/tree/master/docs/build_install) (build release or debug on windows or linux)
* [OSL query language overview](https://github.com/opset/openset/tree/master/docs/osl/README.md)
* [OSL language reference](https://github.com/opset/openset/blob/master/docs/osl/language_reference.md)
* [REST API](https://github.com/opset/openset/tree/master/docs/rest/README.md)
* [Samples](https://github.com/opset/openset_samples)
* [Clustering](#) (coming soon)

:coffee: These documents are a work in progress.

95 changes: 56 additions & 39 deletions docs/rest/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Cluster
# API

## PUT /v1/cluster/init?partitions={#}
## Cluster

### PUT /v1/cluster/init?partitions={#}

Initializes a cluster (a cluster with just **one** node will still need initializing).

Expand All @@ -12,7 +14,7 @@ Returns a 200 or 400 status code.

> :pushpin:the ideal partition size is the lowest possible number that will fit the size of your cluster in the long run. There is overhead incurred with each partition, but you also want to pick a number that will allow you to grow. Picking a number less than the number of processor cores in your cluster will **not** allow you to reach peak performance.

## PUT /v1/cluster/join?host={host|ip}&port={port}
### PUT /v1/cluster/join?host={host|ip}&port={port}

**query_params:**

Expand All @@ -25,15 +27,15 @@ Returns a 200 or 400 status code.

## Table

## POST /v1/table/{table} (create table)
### POST /v1/table/{table} (create table)

Create a table by passing a JSON array of desired table properties and types.

### id_type
#### id_type (required)

The `id_type` key specifies whether this table uses `numeric` or `textual` customer ids.
The `id_type` determines whether this table uses `numeric` or `textual` customer ids.

### properties
#### properties (required)

Properties you would like to track are defined as an array under the `properties` key.

Expand All @@ -44,11 +46,12 @@ A property at minimum requires a name and type.
- `is_set` - if provided and `true`, this property will be a collection of values, rather than single value (think product tags i.e. 'red', 'big', 'kitchen')
- `is_customer` - If provided and `true` this is property is a special customer property. Customer Properties unlike regular properties are associated with the customer rather than events in their history. Facts about a customer. These might be values like `age` or `country` or created by an ML model.

### event_order

#### event_order (optional)

The `event_order` key allows you to specify insert sort order for event types. For example, if you want a `purchase` event to always precede `purchase_items` events you would specify `"event_order": ['purchase', 'purchase_items']`. This can make it easier to write queries as order is guaranteed on events that have the same timestamp.

### example
#### example

```
{
Expand Down Expand Up @@ -78,7 +81,7 @@ The `event_order` key allows you to specify insert sort order for event types. F

Returns a 200 or 400 status code.

## GET /v1/table/{table} (describe table)
### GET /v1/table/{table} (describe table)

Returns JSON describing the table.

Expand Down Expand Up @@ -136,11 +139,11 @@ Returns JSON describing the table.

Returns a 200 or 400 status code.

## PUT /v1/table/{table}/property/{prop_name}?{property definition params}
### PUT /v1/table/{table}/property/{prop_name}?{property definition params}

Adds a property to an existing table.

### params
#### params

- `prop_name` can be any string consisting of lowercase letters `a-z`, numbers `0-9`, or the `_`. Properties cannot start with number.
- `type` can be `text|int|double|bool`.
Expand All @@ -149,15 +152,15 @@ Adds a property to an existing table.

Returns a 200 or 400 status code.

## DELETE /v1/table/{table}/property/{prop_name}
### DELETE /v1/table/{table}/property/{prop_name}

Removes a property from the table.

- `prop_name` can be any string consisting of lowercase letters `a-z`, numbers `0-9`, or the `_`. Properties cannot start with number.

Returns a 200 or 400 status code.

## PUT /v1/subscription/{table}/{segment_name}/{sub_name}
### PUT /v1/subscription/{table}/{segment_name}/{sub_name}

To subscribe to segment changes, the segment must already exist.

Expand Down Expand Up @@ -209,13 +212,13 @@ Example body for web-hook call:
}
```

# DELETE /v1/subscription/{table}/{segment_name}/{sub_name}
### DELETE /v1/subscription/{table}/{segment_name}/{sub_name}

Delete a segment subscription.

# Queries

## POST /v1/query/{table}/event
### POST /v1/query/{table}/event

Analytics are generated by calling the `event` endpoint.

Expand All @@ -230,16 +233,12 @@ This will perform an event scanning query by executing the provided `OSL` script
| `sort=` | `prop_name` | sort by `select` property name or `as name` if specified. specifying `sort=group`, will sort the result set by using grouping names. |
| `order=` | `asc/desc` | default is descending order. |
| `trim=` | `# limit` | clip long branches at a certain count. Root nodes will still include totals for the entire branch. |
| `str_{var_name}` | `text` | populates variable of the same name in the params block with a string value |
| `int_{var_name}` | `integer` | populates variable of the same name in the params block with a integer value |
| `dbl_{var_name}` | `double` | populates variable of the same name in the params block with a double value |
| `bool_{var_name}` | `true/false` | populates variable of the same name in the params block with a boolean value |

**result**

200 or 400 status with JSON data or error.

## POST /v1/query/{table}/segment
### POST /v1/query/{table}/segment

This will perform an index counting query by executing the provided `OSL` script in the POST body as `text/plain`. The result will be in JSON and contain results or any errors produced by the query.

Expand All @@ -255,7 +254,7 @@ A single counts query can contain multiple sections to create multiple segments

**post body:**

The post body can include multiple sections. The `@` decorator is used to define sections. The example below is using the sample `high_street` sample data to create two segments named `products_home` and `products_outdoor`.
The post body can include multiple segment definitions. The `@` decorator is used to define code blocks for each segment. The example below is using the sample `high_street` sample data to create two segments named `products_home` and `products_outdoor`.

The `params` on the `@segment` definition tell OpenSet to not-recalculate the segment if it's within the TTL, and that it's ok to use a cached version. It also tells OpenSet to refresh this segment about every 300 seconds.

Expand Down Expand Up @@ -293,7 +292,7 @@ end

200 or 400 status with JSON data or error.

## GET /v1/query/{table}/property/{prop_name}
### GET /v1/query/{table}/property/{prop_name}

The property query allows you to query all the values within a named property in a table as well as perform searches and numeric grouping.

Expand All @@ -318,24 +317,42 @@ The property query allows you to query all the values within a named property in

200 or 400 status with JSON data or error.

## GET /v1/query/{table}/customer
### GET /v1/query/{table}/customer

Returns the event sequence for an individual customer.

> :pushpin: If events contain complex data (i.e. sub values), OpenSet will re-condense the data by folding up data permeations generated on insert. The folded row may be grouped differently than the one provided to `/insert` but will be logically identical.
**query parameters:**

| param | values | note |
| ------ | ------------- | ------------ |
| `id=` | `number/text` | Customer ID |

**result**

200 or 400 status with JSON data or error.

### POST /v1/query/{table}/customers

Analytics are generated by calling the `event` endpoint.

This will perform an event scanning query by executing the provided `OSL` script in the POST body as `text/plain`. The result will be in JSON and contain results or any errors produced by the query.

**query parameters:**

| param | values | note |
| ------ | -------- | ----------------------------------------------------- |
| `sid=` | `string` | If you are using textual IDs use the `sid=` parameter |
| `id=` | `number` | If you are using numeric IDs use the `id=` parameter |
| param | values | note |
| ------------------ | ------------ | --------------------------------------------------------------------------------------------------------------------------------------- |
| `debug=` | `true/false` | will return the assembly for the query rather than the results |
| `segments=` | `segment` | comma separted segment list. Segment must be created with a `/segment` query (see next section). Default segment is `*` (all customers) |
| `sort=` | `prop_name` | Name of property to sort by. |
| `order=` | `asc/desc` | default is descending order. |
| `trim=` | `# limit` | clip long branches at a certain count. Root nodes will still include totals for the entire branch. |
| `cursor=` | `key,key` | a resume from cursor is provided with each query to allow for pagination. |

**result**

200 or 400 status with JSON data or error.

## POST /v1/query/{table}/histogram/{name}
### POST /v1/query/{table}/histogram/{name}

This will generate a histogram using`OSL` script in the POST body as `text/plain`. The result will be in JSON and contain results or any errors produced by the query.

Expand Down Expand Up @@ -376,7 +393,7 @@ return( to_weeks(now - last_stamp) )

200 or 400 status with JSON data or error.

## POST /v1/query/{table}/batch (experimental)
### POST /v1/query/{table}/batch (experimental)

Run multiple segment, property and histogram queries at once, generate a single result. Including `foreach` on histograms.

Expand Down Expand Up @@ -416,12 +433,12 @@ end

```

# Internode (internode node chatter)
## Internode (internode node chatter)

Don't call these from client code.
The `/v1/internode` REST interface is used internally to maintain a proper functioning cluster.

## GET /v1/cluster/is_member
### GET /v1/cluster/is_member

This will return a JSON object informing if the node is already part of a cluster

Expand All @@ -431,37 +448,37 @@ This will return a JSON object informing if the node is already part of a cluste
}
```

## POST /v1/internode/join_to_cluster
### POST /v1/internode/join_to_cluster

Joins an empty node to the cluster. This originates with the `/v1/cluster/join` endpoint. `/v1/cluster/join` will issue a `/v1/interndoe/is_cluster_member` and verify the certificate before this endpoint (`/v1/internode/join_to_cluster`) is called.

This endpoint transfers information about tables, subscribers, and partition mapping.

## POST /v1/internode/add_node
### POST /v1/internode/add_node

Dispatched to all nodes by `/v1/cluster/join` to inform all nodes in the cluster that a new node has been joined to the cluster. Nodes receiving `add_node` will adjust their node mapping.

At this point the node will be empty. The `sentinel` for the elected node will start balancing to this node shortly after this dispatch.

## POST /v1/internode/map_change
### POST /v1/internode/map_change

Dispatched by `sentinel` when node mapping and membership have changed. This is the basic mechanism that keeps cluster topology in sync.

## PUT /v1/internode/transfer?partition={partition_id}&node={dest_node_name}
### PUT /v1/internode/transfer?partition={partition_id}&node={dest_node_name}

This initiates a partition transfer. The node containing the partition to transfer is contacted directly. It is provided the `partition_id` to transfer and the `dest_node_name` to send it to.

This will result in potentially several transfers, one for each table using `POST /v1/internode/transfer`. The recipient receives `partition_id` and `table_name` for each block.

After a successful transfer the `sentinel` will send a `POST /v1/internode/map_change` request to tell the cluster that the partition is available.

## POST /v1/internode/transfer?partition={partition_id}&table={table_name}
### POST /v1/internode/transfer?partition={partition_id}&table={table_name}

Transfers packed `binary` data for partition. Partition is `partition_id` is passed in URL as an integer.

# Other

## GET /ping
### GET /ping

If the node is runing, this will respond with 200 OK and JSON:

Expand Down
Loading