flink
classloader.resolve-order: child-first # parent-first
#classloader.parent-first-patterns.default:
classloader.parent-first-patterns.additional: com.codahale.metrics
java.lang.LinkageError: loader constraint violation: when resolving method 'void org.apache.flink.dropwizard.metrics.DropwizardHistogramWrapper.<init>(com.codahale.metrics.Histogram)' the class loader org.apache.flink.util.ChildFirstClassLoader @1f4f7e8c of the current class, org/apache/iceberg/flink/sink/IcebergStreamWriterMetrics, and the class loader 'app' for the method's defining class, org/apache/flink/dropwizard/metrics/DropwizardHistogramWrapper, have different Class objects for the type com/codahale/metrics/Histogram used in the signature (org.apache.iceberg.flink.sink.IcebergStreamWriterMetrics is in unnamed module of loader org.apache.flink.util.ChildFirstClassLoader @1f4f7e8c, parent loader 'app'; org.apache.flink.dropwizard.metrics.DropwizardHistogramWrapper is in unnamed module of loader 'app')
at org.apache.iceberg.flink.sink.IcebergStreamWriterMetrics.<init>(IcebergStreamWriterMetrics.java:54)
at org.apache.iceberg.flink.sink.IcebergStreamWriter.open(IcebergStreamWriter.java:56)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateAndGates(StreamTask.java:858)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$restoreInternal$5(StreamTask.java:812)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:812)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:771)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:970)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:939)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:763)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.base/java.lang.Thread.run(Unknown Source)
spark
--conf "spark.driver.extraClassPath="
--conf "spark.executor.extraClassPath="
--conf "spark.driver.extraJavaOptions="
--conf "spark.executor.extraJavaOptions="
--conf "spark.driver.extraLibraryPath="
--conf "spark.executor.extraLibraryPath="
--conf "spark.driver.userClassPathFirst=false"
--conf "spark.executor.userClassPathFirst=false"
MutableURLClassLoader vs AppClassLoader
- org.apache.spark.util.MutableURLClassLoader
java.net.URLClassLoader<—org.apache.spark.util.MutableURLClassLoader<—org.apache.spark.util.ChildFirstURLClassLoader
三方包
spark加载依赖包
## 从ivy下载
bin/spark-submit \
--conf "spark.jars.packages=cn.hutool:hutool-all:5.8.39" \
jar-file
## 25/07/29 03:10:56 INFO SparkContext: Added JAR file:///opt/bitnami/spark/.ivy2/jars/cn.hutool_hutool-all-5.8.39.jar at spark://192.168.66.115:24911/jars/cn.hutool_hutool-all-5.8.39.jar with timestamp 1753758656002
使用 spark-submit 时,应用程序 jar 以及 --jars
选项中包含的任何 jar 将自动传输到集群。 在 –jars 之后提供的 URL 必须用逗号分隔。该列表包含在驱动程序和执行程序类路径中。目录扩展不适用于 --jars
。Spark 使用以下 URL 方案来允许不同的策略来传播 jar:
- file: – Absolute paths and file:/ URIs are served by the driver’s HTTP file server, and every executor pulls the file from the driver HTTP server.
- hdfs:, http:, https:, ftp: – these pull down files and JARs from the URI as expected
- local: – a URI starting with local:/ is expected to exist as a local file on each worker node. This means that no network IO will be incurred, and works well for -large files/JARs that are pushed to each worker, or shared via NFS, GlusterFS, etc.
flink加载依赖包
bin/flink run \
--classpath https://maven.aliyun.com/repository/public/cn/hutool/hutool-all/5.8.39/hutool-all-5.8.39.jar \
jar-file
注: bin/spark-submit
命令中的jar-file 支持http, bin/flink
命令中的jar-file不支持http
spark必会知识
- Spark Thrift Server (STS)
- Adaptive Query Execution (AQE)
- Dynamic Resource Allocation,简称DRA)
- spark shuffle
- Storage Partition Join (SPJ)
- 在 Spark SQL 中,catalog.database.table 这种结构被称为 表的三段式标识符(three-part identifier),它用于唯一标识一个表。