Important: #105310 - Create CHAR and BINARY as fixed-length columns
See forge#105310
Description
TYPO3 parses ext_
files into a Doctrine DBAL object schema to define
a virtual database scheme, enriched with
Default
information for
TCA-managed tables and fields.
Fixed and variable length variants have been parsed already in the past, but missed
to flag the column as
$fixed = true
for the fixed-length database field types
CHAR
and
BINARY
. This resulted in the wrong creation of these columns as
VARCHAR
and
VARBINARY
, which is now corrected.
ext_tables.sql | created as (before) | created as (now) |
---|---|---|
CHAR(10) | VARCHAR(10) | CHAR(10) |
VARCHAR(10) | VARCHAR(10) | VARCHAR(10) |
BINARY(10) | VARBINARY(10) | BINARY(10) |
VARBINARY(10) | VARBINARY(10) | VARBINARY(10) |
Not all database systems (RDBMS) act the same way for fixed-length columns. Implementation differences need to be respected to ensure the same query/data behaviour across all supported database systems.
Warning
Using fixed-length
CHAR
and
BINARY
column types requires to carefully work
with data being persisted and retrieved from the database due to differently
behaviour specifically of PostgreSQL.
Fixed-length
CHAR
Key Difference Between CHAR and VARCHAR
The main difference between
CHAR
and
VARCHAR
is how the database
stores character data in a database.
CHAR
, which stands for character
,
is a fixed-length data type, meaning it always reserves a specific amount of
storage space for each value, regardless of whether the actual data occupies
that space entirely. For example, if a column is defined as
CHAR
and
the word apple
is stored inside of it, it will still occupy 10 characters worth of
space (not just 5). Unusued characters are padded with extra spaces.
On the other hand,
VARCHAR
, short for variable character
, is a
variable-length data type. It only uses as much storage space as needed
to store the actual data without padding. So, storing the word apple
in a
VARCHAR
column will only occupy 5 characters worth of
space, leaving the remaining table row space available for other data.
The main difference from Postgre
to My
/Maria
/SQLite
is:
Postgre
also returns the filler-spaces for a value not having the
column length (returning apple
).
On top of that, the filled-up spaces are also respected for query conditions, sorting
or data calculations (
concat
for example). These two facts makes a huge
difference and must be carefully taken into account when using
CHAR
field.
Rule of thumb for fixed-length
CHAR
columns
- Only use with ensured fixed-length values (so that no padding occurs).
- For 255 or more characters
VARCHAR
orTEXT
must be used.
More hints for fixed-length
CHAR
columns
- Ensure to write fixed-length values for
CHAR
(non-space characters), for example use hash algorithms which produce fixed-length hash identifier values. -
Ensure to use query statements to
trim
ORright
the value withinPad WHERE
,HAVING
orSELECT
operations, when values are not guaranteed to contain fixed-length values.Tip
Helper
\TYPO3\
expressions can be used, for exampleCMS\ Core\ Database\ Query\ Expression\ Expression Builder \TYPO3\
orCMS\ Core\ Database\ Query\ Expression\ Expression Builder->trim () \TYPO3\
to.CMS\ Core\ Database\ Query\ Expression\ Expression Builder->right Pad () - Usage of
CHAR
must be avoided when using the column with theExtbase ORM
, because fixed-value length cannot be ensured due to the lack of usingtrim/
within the ORM generated queries. Only with ensured fixed-length values, it is usable withright Pad Extbase ORM
. - Cover custom queries extensively with
functional tests
executed against all supported database platforms. Code within public extensions should ensure to test queries and their operations against all officially TYPO3-supported database platforms.
Example of difference in behaviour of fixed-length
CHAR
types
CREATE TABLE `tt_content` (
`some_label` CHAR(10) DEFAULT '' NOT NULL,
);
Now, add some data. One row which fits exactly to 10 characters, and one row that only uses 6 characters:
<?php
use TYPO3\CMS\Core\Database\ConnectionPool;
use TYPO3\CMS\Core\Utility\GeneralUtility;
$queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
->getConnectionForTable('tt_content');
// adding a value with 10 chars
$queryBuilder->insert(
'tt_content',
[
'some_label' => 'some-label',
],
[
'some_label' => Connection::PARAM_STR,
],
);
// adding a value with only 6 chars
$queryBuilder->insert(
'tt_content',
[
'some_label' => 'label1',
],
[
'some_label' => Connection::PARAM_STR,
],
);
Now see the difference in retrieving these records:
<?php
use TYPO3\CMS\Core\Database\ConnectionPool;
use TYPO3\CMS\Core\Utility\GeneralUtility;
$queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
->getQueryBuilderForTable('tt_content');
$rows = $queryBuilder
->select('uid', 'some_label')
->from('tt_content')
->executeQuery()
->fetchAllAssociative();
Depending on the used database platform, the retrieved rows would contain these strings:
<?php
$rows = [
[
'uid' => 1,
'some_label' => 'some-label',
],
[
'uid' => 2,
'some_label' => 'label1',
],
];
but for PostgreSQL
<?php
$rows = [
[
'uid' => 1,
'some_label' => 'some-label',
],
[
'uid' => 2,
// PostgreSQL applies the fixed length to the value directly,
// filling it up with spaces
'some_label' => 'label1 ',
],
];
or as a diff
to make this even more visible:
<?php
$rows = [
[
'uid' => 1,
'some_label' => 'some-label',
],
[
'uid' => 2,
- 'some_label' => 'label1', // MySQL, MariaDB, SQLite
+ 'some_label' => 'label1 ', // PostgreSQL
],
];
Note
Because of this, retrieved values need to be trimmed OR padded AFTER the query results are fetched, to ensure the same retrieved value across all supported database systems. Another option is to ensure that the persisted data always has a fixed-value length, like by using the aforementioned hashing algorithms (making results not human-readable).
To raise the awareness for problems on this topic, using the trimmed value inside
a
WHERE
condition will match the record, but the returned value will be different
from the value used in the condition:
<?php
use TYPO3\CMS\Core\Database\ConnectionPool;
use TYPO3\CMS\Core\Utility\GeneralUtility;
$queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
->getQueryBuilderForTable('tt_content');
$rows = $queryBuilder
->select('uid', 'some_label')
->from('tt_content')
->where(
$queryBuilder->eq(
'some_label',
$queryBuilder->createNamedParameter('label1'), // trimmed value!
),
)
->executeQuery()
->fetchAllAssociative();
// $rows contains the record for
// PostgreSQL: $rows = [['uid' => 2, 'some_label' => 'label1 ']];
// Others....: $rows = [['uid' => 2, 'some_label' => 'label1']];
<?php
use Doctrine\DBAL\Platforms\TrimMode;
use TYPO3\CMS\Core\Database\ConnectionPool;
use TYPO3\CMS\Core\Utility\GeneralUtility;
$queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
->getQueryBuilderForTable('tt_content');
$rows = $queryBuilder
->select('uid')
->addSelectLiteral(
$queryBuilder->expr()->as(
$queryBuilder->expr()->trim(
'fixed_title',
TrimMode::TRAILING,
' '
),
'fixed_title',
),
)
->from('tt_content')
->where(
$queryBuilder->eq(
'some_label',
$queryBuilder->createNamedParameter('label1'),
),
)
->executeQuery()
->fetchAllAssociative();
// $rows contains the record for
// PostgreSQL: $rows = [['uid' => 2, 'some_label' => 'label1']];
// Others....: $rows = [['uid' => 2, 'some_label' => 'label1']];
// and ensures the same content across all supported database systems.
On PostgreSQL, performing a query for a space-padded value will not actually return the expected row:
<?php
use TYPO3\CMS\Core\Database\ConnectionPool;
use TYPO3\CMS\Core\Utility\GeneralUtility;
// PostgreSQL specific query!
$queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
->getQueryBuilderForTable('tt_content');
$rows = $queryBuilder
->select('uid', 'some_label')
->from('tt_content')
->where(
$queryBuilder->eq(
'some_label',
$queryBuilder->createNamedParameter('label1 '), // untrimmed value!
),
)
->executeQuery()
->fetchAllAssociative();
// $rows === []
Additional
Expression
methods can be used to ensure same behaviour on all platforms:
\TYPO3\
CMS\ Core\ Database\ Query\ Expression\ Expression Builder:: trim () \TYPO3\
CMS\ Core\ Database\ Query\ Expression\ Expression Builder:: right Pad ()
Recommendation
CHAR
and
BINARY
fields can be used (for storage or performance adjustments),
but only when composed data and queries take care of database-system differences.
Otherwise, the "safe bet" is to consistently utilize
VARCHAR
and
VARBINARY
columns types.