List of important ideas, for a good conceptualization of the data structure #4
Replies: 2 comments 18 replies
-
|
As I've pointed out here, the current layout is only a test and not “the” solution. Splitting data into individual files may help to reduce query sizes and improve structural organization, particularly in terms of chemical or physical properties of elements, and to categorize specific disciplines in a useful way. I'm not a fan of having a single file with thousands of lines of code, because retrieval, maintenance, and parsing could potentially become more difficult. Just think of the necessary RAM. In my opinion, this was a huge disadvantage of the old database. BibTex and its use with LaTeX could serve as a good template for our reference system. I don't think it's helpful to save full refs every time. Instead, a kind of index with different namespaces and a simple ID system should be used to refer to sources that have been used. Precise links and the access date are important, as they can later be used to display direct ref links on the website, allowing users to see where the data comes from. Discrepancies in data from different sources are definitely a major problem, but we can easily point this out in the database's README. We should take care to warn users that data may be incomplete, outdated, or even incorrect, especially since some information could pose health risks. However, a good reference system should provide transparency in terms of the data's origin. You're absolutely right about providing data as Zip or Rar files - something that can be easily automated using GitHub actions. The same goes for automatic sorting, checking for data consistency and type safety when new commits are made. Orgs. have 2000 free minutes per month, which should be sufficient for the whole project (this and other repos). And if not, I can also set up a private runner. Translations related to the database? I'm not sure about that yet. For the (reworked) website, there will probably be a separate repository later on - just like I'm building my other big project. |
Beta Was this translation helpful? Give feedback.
-
|
I've been thinking a bit about the basic type of a (physical) element property. I believe it makes sense to restructure a few things so that much more complex values can be stored and data consistency becomes more secure. In order to do so, units would need to be moved to a higher level of abstraction and additional occurring variants of an element need to be described (e.g. ozone for oxygen, deuterium for hydrogen, red and black phosphorus, etc.). Here's an idea: Unitsexport type UnitDimension = 'temperature' | 'pressure' | ...;
export interface UnitDef {
dimension: string;
symbol: string;
name: string;
si_factor: number;
}Referencesexport type ReferenceType = 'journal' | 'book' | 'database' ...;
export interface ReferenceDef {
type: ReferenceType;
title: string;
authors?: string[];
journal?: string;
edition?: string;
publisher?: string;
year?: number;
doi?: string;
isbn?: string;
url?: string;
accessed?: string;
}Forms / allotropesexport type FormCategory = | 'allotrope' | 'phase' | 'isotope' | 'other';
export interface FormDef {
name: string;
category: FormCategory;
formula?: string;
phase?: 'solid' | 'liquid' | 'gas' | 'plasma';
structure?: string;
}Conditionsexport interface ConditionValue {
value: number;
unit_ref: string;
}
export interface Conditions {
[ property: string ]: ConditionValue;
}Valuesexport type ValueType = 'measured' | 'calculated' | 'estimated' | 'theoretical';
// Coupled values (all of them apply simultaneously, see e.g. triple point of water)
export interface ValueSet {
property: string; // 'temperature' | 'pressure' | ...
value: number;
unit_ref: string;
}
export interface PropertyValue {
value?: number; // single value
value_set?: ValueSet[]; // linked values
range?: [ number, number ]; // optional area
unit_ref?: string; // reference to UnitDef
type?: ValueType; // value origin
deviation?: number; // standard deviation / error
conditions?: Conditions; // independent conditions
form_refs?: string[]; // References to FormDef
note?: string; // short comment
references?: string[]; // links to ReferenceDef
}
export interface Property {
values: PropertyValue[];
} |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The main reason I want to start this conversation is mostly to speed up the process of finding out good ideas or being confident in a design.
One may call that a brainstorming.
concrete ideas / situations / concepts
Extensibility first
That's something I may have felt to be the previous database bottleneck.
The principle of it is that, all abstraction must be built around the fact that data may be added and eventually removed.
Also Accessing data is streamlined, in a sense where I
File splitting.
I've seen that this option has been selected fairly early, unsure of why this design.
Traceability / Sourcing.
Adding a way to add the source of any data displayed, may it be wikipedia or any website. Then the question is should we add
a time of sampling, specific link or just the broad website as source.
Educational hints
Where we should be able to point out some subtleties? For example that the value given for the curie point varies significantly (by a margin of +-5) across sources for reasons.
Could we give a score of trust in values we give? I could see for instance saying that the atomic number is extremely reliable. But The melting point is heavily influenced by other commonly found variables such as pressure.
How about giving out some tips about not mixing weight and mass?
Typing and JSON validation
At least to validate, I don't know if the UI tools used will support typing?
But in all, it probably is ad hoc and won't be critical to the functioning of the application
Extension to other storage format.
For now the data is aimed to be embedded inside JSON. But maybe later it would fit better to include that data inside a .sqlite container? Maybe compacted inside a .gzip to be used in a Client Side Rendering context?
Store translation?
Where should we store it?
Beta Was this translation helpful? Give feedback.
All reactions