apache ozone
这个想法自2014年下半年在HDFS社区开始提出,2015年正式启动。因为Ozone的发展节奏太过迅速,远超HDFS的版本迭代节奏,2020年Ozone正式从Hadoop的社区剥离,成为单独的项目,也就是现在的Apache Ozone。这个项目的核心目标是解决HDFS的NameNode扩展问题,解决其小文件导致的存储瓶颈问题,并希望能支持100+亿的文件处理。
- 腾讯赵明超:Apache Ozone下一代大数据存储解密
- 取代 HDFS?Ozone 在腾讯的最新研究进展
- https://arch-long.cn/articles/hadoop/Ozone.html
install
# wget https://archive.apache.org/dist/ozone/2.0.0/ozone-2.0.0.tar.gz
wget https://mirrors.ustc.edu.cn/apache/ozone/2.0.0/ozone-2.0.0.tar.gz
mkdir -p ozone
tar -xvzf ozone-2.0.0.tar.gz --strip-components 1 -C ./ozone/
./bin/ozone --help
Usage: ozone [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
OPTIONS is none or any of:
--buildpaths attempt to add class files from build tree
--config dir Ozone config directory
--daemon (start|status|stop) operate on a daemon
--debug turn on shell script debug mode
--help usage information
--hostnames list[,of,host,names] hosts to use in worker mode
--hosts filename list of hosts to use in worker mode
--jvmargs arguments append JVM options to any existing options defined in the OZONE_OPTS environment variable. Any defined in OZONE_CLIENT_OPTS will be append after these jvmargs
--loglevel level set the log4j level for this command
--validate (continue) validates if all jars as indicated in the corresponding OZONE_RUN_ARTIFACT_NAME classpath file are present, command execution shall continue post validation failure if 'continue' is passed
--workers turn on worker mode
SUBCOMMAND is one of:
Admin Commands:
daemonlog get/set the log level for each daemon
Client Commands:
admin Ozone admin tool
checknative checks if native libraries are loaded
classpath prints the class path needed for running ozone commands
debug Ozone debug tool
dtutil operations related to delegation tokens
envvars display computed Hadoop environment variables
freon runs an ozone data generator
fs run a filesystem command on Ozone file system. Equivalent to 'hadoop fs'
genconf generate minimally required ozone configs and output to ozone-site.xml in specified path
getconf get ozone config values from configuration
insight tool to get runtime operation information
ratis Ozone ratis tool
repair Ozone repair tool
s3 command line interface for s3 related operations
sh command line interface for object store operations
tenant command line interface for multi-tenant related operations
version print the version
Daemon Commands:
csi run the standalone CSI daemon
datanode run a HDDS datanode
httpfs run the HTTPFS compatible REST gateway
om Ozone Manager
recon run the Recon service
s3g run the S3 compatible REST gateway
scm run the Storage Container Manager service
SUBCOMMAND may print help when invoked w/o parameters or with -h.
ozone-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<configuration>
<property>
<name>ozone.om.address</name>
<value>localhost</value>
<tag>OM, REQUIRED</tag>
<description>
The address of the Ozone OM service. This allows clients to discover
the address of the OM.
</description>
</property>
<property>
<name>ozone.metadata.dirs</name>
<value>/tmp</value>
<tag>OZONE, OM, SCM, CONTAINER, STORAGE, REQUIRED</tag>
<description>
This setting is the fallback location for SCM, OM, Recon and DataNodes
to store their metadata. This setting may be used only in test/PoC
clusters to simplify configuration.
For production clusters or any time you care about performance, it is
recommended that ozone.om.db.dirs, ozone.scm.db.dirs and
hdds.container.ratis.datanode.storage.dir be configured separately.
</description>
</property>
<property>
<name>ozone.scm.client.address</name>
<value>localhost</value>
<tag>OZONE, SCM, REQUIRED</tag>
<description>
The address of the Ozone SCM client service. This is a required setting.
It is a string in the host:port format. The port number is optional
and defaults to 9860.
</description>
</property>
<property>
<name>ozone.scm.names</name>
<value>localhost</value>
<tag>OZONE, REQUIRED</tag>
<description>
The value of this property is a set of DNS | DNS:PORT | IP
Address | IP:PORT. Written as a comma separated string. e.g. scm1,
scm2:8020, 7.7.7.7:7777.
This property allows datanodes to discover where SCM is, so that
datanodes can send heartbeat to SCM.
</description>
</property>
</configuration>
OZONE-SITE.XML_这种环境变量是如何被支持的
https://github.com/apache/ozone-docker-runner
https://github.com/apache/ozone-docker-runner/blob/master/Dockerfile
https://github.com/apache/ozone-docker
https://github.com/apache/ozone-docker/blob/master/Dockerfile
经过查找发现${OZONE_HOME}/bin/ozone脚本中会调用 ${OZONE_HOME}/libexec/下的相关脚本
类似 ozone-functions.sh
ozone-config.sh
entrypoint.sh
这些脚本中还包含了一些WAITFOR
ENSURE_OM_INITIALIZED
等环境变量
libexec/envtoconf.py
—-> libexec/entrypoint.sh
—->ENTRYPOINT ["/usr/local/bin/dumb-init", "--", "entrypoint.sh"]
说明只有Docker环境才能使用OZONE-SITE.XML_
这种环境变量,而使用二进制包部署的时候这种环境变量并不被支持(可以直接编写${OZONE_CONF_DIR}/ozone-site.xml
)
主要处理环境变量到xml文件映射的是libexec/envtoconf.py
这个python脚本
而容器中包含两个entrypoint.sh
bash-5.1$ whereis entrypoint.sh
entrypoint.sh: /usr/local/bin/entrypoint.sh /opt/hadoop/libexec/entrypoint.sh
那么 ENTRYPOINT ["/usr/local/bin/dumb-init", "--", "entrypoint.sh"]
这个写法会调用哪个呢?是调用其中一个?还是俩都调用?
经过测试只有一个在起作用,且起作用的是/opt/hadoop/libexec/entrypoint.sh
起作用的
FROM apache/ozone:2.0.0
USER root
RUN \
sed -i "s@exec@echo \"XXXXXXXXXXXX\";exec@g" /usr/local/bin/entrypoint.sh; \
sed -i "s@exec@echo \"DDDDDDDDDDDD\";exec@g" /opt/hadoop/libexec/entrypoint.sh
USER hadoop
OZONE-SITE.XML_ozone.om.address=om
OZONE-SITE.XML_ozone.om.http-address=om:9874
OZONE-SITE.XML_ozone.scm.names=scm
OZONE-SITE.XML_ozone.scm.datanode.id=/data/datanode.id
OZONE-SITE.XML_ozone.scm.block.client.address=scm
OZONE-SITE.XML_ozone.metadata.dirs=/data/metadata
OZONE-SITE.XML_ozone.handler.type=distributed
OZONE-SITE.XML_ozone.scm.client.address=scm
OZONE-SITE.XML_ozone.scm.dead.node.interval=5m
OZONE-SITE.XML_ozone.replication=1
OZONE-SITE.XML_hdds.datanode.dir=/data/hdds
OZONE-SITE.XML_hdds.scmclient.max.retry.timeout=30s